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Abstract 

We  present  an  extension  to  Standard  ML,  called  SMLSC,  to  support  separate  compilation.  The 
system  gives  meaning  to  individual  program  fragments,  called  units.  Units  may  depend  on  one 
another  in  a  way  specified  by  the  programmer.  A  dependency  may  be  mediated  by  an  interface 
(the  type  of  a  unit);  if  so,  the  units  can  be  compiled  separately.  Otherwise,  they  must  be  com¬ 
piled  in  sequence.  We  also  propose  a  methodology  for  programming  in  SMLSC  that  reflects  code 
development  practice  and  avoids  syntactic  repetition  of  interfaces.  The  language  is  given  a  formal 
semantics,  and  we  argue  that  this  semantics  is  implementable  in  a  variety  of  compilers. 


This  material  is  based  on  work  supported  in  part  by  the  National  Science  Foundation  under  grant  0121633 
Language  Technology  for  Trustless  Software  Dissemination  and  by  the  Defense  Advanced  Research  Projects  Agency 
under  contracts  F196268-95-C-0050  The  Fox  Project:  Advanced  Languages  for  Systems  Software  and  F196228-91- 
C-0168  The  Fox  Project:  Advanced  Development  of  Systems  Software.  Any  opinions,  findings,  conclusions  and 
recommendations  in  this  publication  are  the  authors’  and  do  not  reflect  the  views  of  these  agencies. 

This  report  supersedes  CMU-CS-06-104. 


Report  Documentation  Page 

Form  Approved 

0MB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  0MB  control  number. 

1.  REPORT  DATE 

17  SEP  2006 

3.  DATES  COVERED 

00-00-2006  to  00-00-2006 

4.  TITLE  AND  SUBTITLE 

A  Separate  Compilation  Extension  to  Standard  ML  (Revised  and 
Expanded) 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Carnegie  Mellon  University, School  of  Computer 

Science, Pittsburgh, PA, 15213 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

60 

Standard  Form  298  (Rev.  8-98} 

Prescribed  by  ANSI  Std  Z39-18 


Keywords:  Standard  ML,  separate  compilation,  incremental  compilation,  types 


Introduction 


We  propose  an  extension  to  Standard  ML  called  SMLSC.  SMLSC  supports  separate  compilation  in 
the  sense  that  it  gives  a  static  semantics  to  individual  program  fragments,  which  we  call  units.  A 
unit  may  depend  on  other  units,  and  can  be  type-checked  independently  of  those  units  by  specifying 
what  it  expects  of  them.  These  expectations  are  given  in  the  form  of  interfaces  for  those  other 
units.  When  unit  A  is  checked  against  another  unit  B  via  a  mediating  interface,  we  need  not  have 
access  to  B  at  all.  Therefore  we  say  that  A  is  separately  compiled  (SC)  against  B. 

It  is  also  useful  to  allow  unit  A  to  depend  on  another  unit  B  without  specifying  an  interface  for 
B.  In  this  case,  the  only  way  to  derive  the  context  necessary  to  check  A  is  to  first  check  B  and  read 
off  its  actual  interface.  In  this  scenario  we  say  that  A  is  incrementally  compiled  (IC)  against  B. 

Units  may  be  compiled  and  then  linked  together  to  satisfy  dependencies.  The  compiled  form  of 
a  unit  or  set  of  linked  units  is  called  a  linkset.  A  linkset  may  be  further  linked  with  other  linksets. 
If  a  linkset  has  no  remaining  dependencies,  then  it  can  be  transformed  into  an  executable  program. 

The  goal  of  this  work  is  to  consolidate  and  synthesize  previous  work  on  compilation  manage¬ 
ment  for  ML  into  a  formally  defined  extension  to  the  Standard  ML  language.  The  extension  itself  is 
syntactically  and  conceptually  very  simple.  A  unit  is  a  series  of  Standard  ML  top-level  declarations, 
given  a  name.  To  the  current  top-level  declarations  such  as  structure  and  functor  we  add  an 
import  declaration  that  is  to  units  what  the  open  declaration  is  to  structures.  An  import  declara¬ 
tion  may  optionally  specify  an  interface  for  the  unit,  in  which  case  we  are  able  to  compile  separately 
against  that  dependency;  with  no  interface  we  must  compile  incrementally.  Compatibility  with  ex¬ 
isting  compilers,  including  whole-program  compilers,  is  assured  by  making  no  commitment  to  the 
precise  meaning  of  “compile”  and  “link” — a  compiler  is  free  to  limit  compilation  to  elaboration  and 
type  checking,  and  to  perform  code  generation  as  part  of  linking. 

Sections  1  and  2  summarize  our  main  design  principles,  and  provide  an  overview  of  the  system. 
In  Section  2  we  give  a  small  example  of  its  use.  (We  give  a  larger  example  in  Appendix  G.)  The 
semantics,  formulated  in  the  framework  of  the  Harper-Stone  semantics  of  ML  [12,  13],  is  given  in 
Section  3.  We  give  an  alternative  semantics  in  the  framework  of  The  Definition  of  Standard  ML  [16] 
in  Section  4.  Some  implementation  issues  are  discussed  in  Section  5.  We  conclude  with  a  discussion 
of  related  work  in  Section  7. 

1  Design  Principles 

A  language,  not  a  tool.  We  propose  an  extension  to  the  Standard  ML  language  to  support 
separate  compilation,  rather  than  a  tool  to  implement  it.  The  extension  is  defined  by  a  semantics 
that  extends  the  semantics  of  Standard  ML  to  provide  a  declarative  description  of  the  meanings  of 
the  language  constructs.  The  semantics  provides  a  clear  correctness  criterion  for  implementations 
to  ensure  source-level  compatibility  among  them. 

Flexibility.  A  compilation  unit  consists  of  any  sequence  of  top-level  declarations,  including  sig¬ 
nature  and  functor  declarations.^  However,  since  Standard  ML  lacks  syntactically  expressible 
principle  signatures,  some  units  cannot  be  separately  compiled  from  one  another.  We  therefore 
support  incremental,  as  well  as  separate,  compilation  for  any  unit.  This  means  that  the  interface 
of  a  unit  can  either  be  inferred  from  its  source  (incremental  compilation)  or  explicitly  specified 
(separate  compilation)  at  the  programmer’s  discretion. 

^Consequently,  units  cannot  be  identified  with  Standard  ML  structures. 
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Simplicity.  The  design  provides  only  the  minimum  functionality  of  a  separate  compilation  sys¬ 
tem.  It  omits  any  form  of  compilation  parameters,  conditional  compilation  directives,  or  compiler 
directives.  We  leave  for  future  work  the  specification  of  such  additional  machinery. 

Conservativity.  The  semantics  of  Standard  ML  should  not  be  changed  by  the  introduction  of 
separate  compilation.  In  particular,  we  do  not  permit  “circular  dependencies”  or  similar  concepts 
that  are  not  otherwise  expressible  in  Standard  ML.  This  ensures  that  compilers  should  not  be 
disturbed  by  the  proposed  extension  beyond  what  is  required  to  implement  the  extension  itself. 

Explicit  dependencies.  The  dependencies  among  units  are  explicitly  specified,  not  inferred. 
The  chief  reason  for  this  is  that  dependencies  among  units  may  not  be  syntactically  evident — for 
example,  the  side  effects  of  one  unit  may  influence  the  behavior  of  another.  There  are  in  general 
many  ways  to  order  effects  consistently  with  syntactic  dependencies,  and  these  orderings  need  not 
be  equivalent.  A  lesser  reason  is  that  supporting  dependency  inference  requires  restrictions  on 
compilation  units  that  are  not  semantically  necessary,  reducing  flexibility. 

Environment  independence.  The  separate  compilation  system  is  defined  independently  of  any 
environment  in  which  it  might  be  implemented.  The  design  speaks  in  terms  of  linguistic  and 
semantic  entities,  rather  than  implementation-specific  concepts  such  as  files  or  directories. 

Separation  of  units  from  modules.  The  separate  compilation  system  is  designed  as  a  proper 
extension  to  Standard  ML  so  as  to  ensure  backward  compatibility  of  source  code.  It  is  tempting  to 
identify  compilation  units  with  modules,  but  to  do  so  would  require  that  functors,  signatures,  and 
fixity  declarations  be  permitted  as  components  of  modules.  Permitting  such  an  extension  is  not 
entirely  straightforward;  for  example,  permiting  signature  declarations  in  modules  and  their  types 
can  lead  to  undecidability  of  type  checking  [10]. 

2  Overview 

Units  and  Interfaces 

The  SMLSC  extension  is  organized  around  the  concept  of  a  unit.  A  unit  consists  of  top-level 
declarations,  which  include  declarations  of  signatures,  structures,  and  functors.  Each  unit  is  given 
a  name  by  which  the  unit  is  known  throughout  the  program.  One  unit  may  refer  to  the  components 
of  another  using  an  import  declaration,  which  records  the  dependency  of  the  importing  unit  on  the 
imported  unit,  and  opens  it  for  use  within  the  importing  unit.  This  is  the  only  means  by  which 
one  unit  may  refer  to  another;  we  do  not  support  “dot  notation”  for  accessing  the  components  of  a 
unit.  An  import  declaration  is  a  new  form  of  top-level  declaration.  (This  is  the  only  modification 
that  we  make  to  an  existing  syntactic  category  of  Standard  ML.) 

The  compilation  context  for  a  unit  is  entirely  determined  by  its  imports.  That  is,  all  depen¬ 
dencies  of  a  unit  on  another  unit  must  be  explicitly  indicated  using  import  declarations.  The 
dependency  of  one  unit  on  another  is  mediated  by  an  interface,  the  type  of  a  unit.  The  interface  of 
an  imported  unit  can  be  specified  in  one  of  two  ways,  either  implicitly  or  explicitly,  corresponding 
to  incremental  or  separate  compilation. 

An  import  declaration  of  the  form  import  unitid  :  intexp  specifies  an  explicit  interface  for  the 
imported  unit.  This  permits  the  importing  unit  to  be  compiled  independently  of  the  implementation 
of  unitid,  relying  only  on  the  specified  interface.  This  is  called  separate  compilation,  or  SC  for 
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srcunit 

topdec 


impexp 

intexp 

topspec 


funspec 


unit  unitid  =  unit  topdec  end 

import  impexp 

strdec 

sigdec 

fundee 

local  topdeci  in  topdec2  end 

topdeci  topdec2 

unitid  {:  intexp) 

impexp  I  impexp  2 

intf  topspec  end 

spec 

functor  funspec 
topspec I  topspec 2 

funid{strid  :  sigexp)  :  sigexp'  (and  funspec) 


unit  declaration 
open  units 

structure-level  declaration 
signature  declaration 
functor  declaration 
local  declaration 

open  unit  unitid 

interface  expression 
structure-level  specification 
functor  specification 


Figure  1:  SMLSC  concrete  syntax 


short.  An  import  declaration  of  the  form  import  unitid  specifies  that  the  interface  for  unitid  is  to 
be  inferred  from  its  source  code.  This  is  called  incremental  compilation^  or  IC  for  short. 

The  concrete  syntax  for  units  and  interfaces  in  SMLSC  is  given  in  Figure  1.  We  extend  topdecs 
to  add  import  and  local.  The  import  declaration  (like  open)  allows  multiple  units  to  be  simulta¬ 
neously  imported.  Interfaces  are  topspecs]  this  is  the  syntactic  class  spec  of  Standard  ML,  with  the 
addition  of  a  specification  form  for  functors.  The  local  declaration  limits  the  scope  of  imports, 
just  as  the  structure  declaration  of  the  same  name. 

Projects  and  Linksets 

A  linkset  consists  of  several  compiled  units,  called  its  exports,  together  with  the  names  and  inter¬ 
faces  of  its  imports,  the  units  on  which  it  depends  (following  Cardelli  [5]).  A  project  consists  of  a 
linearly  ordered  sequence  of  source  units  and  linksets.  The  ordering  of  the  components  in  a  project 
is  significant,  both  because  it  specifies  the  order  of  identifier  resolution,  and  because  it  specifies  the 
order  of  computational  effects  when  executed.  Compilation  of  a  project  consists  of  processing  the 
source  units  in  the  specified  order  to  obtain  linksets,  and  then  knitting  them  together  to  resolve 
dependencies. 

Linking  consists  of  resolving  inter-unit  dependencies  by  binding  exports  to  imports  among 
linksets.  When  all  references  have  been  resolved,  the  resulting  linkset  can  be  completed  to  form  an 
executable. 

We  do  not  give  a  concrete  syntax  for  linksets,  as  we  do  not  intend  for  programmers  to  write 
them,  nor  do  we  expect  compatibility  of  linksets  across  implementations.  Rather,  they  are  left 
as  implementation-specific  concepts  (such  as  object  files),  which  are  modeled  here  by  the  abstract 
semantic  objects  described  in  Sections  3  and  4. 

Examples 

We  begin  with  a  few  simple  examples  to  illustrate  the  features  of  the  system. 

Suppose  that  we  have  a  library  of  data  structures  whose  name  is  Collections.  It  is  natural  to 
place  this  library  in  a  unit.  Let’s  assume  it  contains  only  the  queue  data  structure: 
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unit  Collections 
unit 


signature  QUEUE  = 
sig 

type  ' a  queue 

val  empty  :  'a  queue 

val  push  :  ’a  *  ’a  queue  ->  'a 

val  pop  :  ’a  queue  ->  ’a  *  'a  queue 

end 

structure  Queue  : >  QUEUE  = 
struct  (*  •  •  •  *)  end 

end 

A  client  of  the  Collections  library  can  import  it  using  IC  easily: 

unit  Scheduler  = 
unit 

import  Collections 

structure  Sched  = 
struct 

type  job  =  (*  •  •  •  *) 
val  readyqueue  = 

ref  Queue . empty  :  j  ob  Queue . queue  ref 
(*  • • •  *) 
end 

end 

In  these  examples  we  use  link  to  stand  for  the  semantic  operation  of  compiling  and  linking  a  list 
of  source  units  and  linksets.  We  can  compile  and  link  this  program  as 

L  =  link{Collections,  Scheduler) 

or  we  can  compile  the  library  and  then  the  client 

Lq  =  link{C  ollections) 

Li  =  link{Lo,  Scheduler) 

but  the  Scheduler  unit  may  not  be  compiled  on  its  own. 

Incremental  compilation  is  convenient  when  we  have  source  or  a  compiled  linkset  for  the 
Collections  unit.  We  may  prefer  to  use  separate  compilation,  or  may  be  forced  to  because 
the  implementation  for  Collections  is  not  available.  A  client  with  an  SC  import  looks  like  this: 
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unit  Scheduler2  = 
unit 

import  Collections  : 
intf 

structure  Queue  : 
sig 

type  ’ a  queue 
val  empty  :  ’a  queue 
val  push  :  ’a  *  'a  queue  ->  ’a 
val  pop  :  ’a  queue  ->  ’a  *  ’a  queue 
end 

end 

structure  Sched  = 
struct 

type  job  =  (*  •  •  •  *) 
val  readyqueue  = 

ref  Queue . empty  :  j  ob  Queue . queue  ref 
(*  • • •  *) 
end 

end 

This  allows  Scheduler2  and  Collections  to  be  compiled  separately: 

Lq  =  link{Scheduler2) 

Li  =  link{C  ollections) 

L2  =  link{Li,  Lo) 

However,  writing  the  SC  import  this  way  forces  an  undesirable  repetition  of  code.  If  more  than  one 
client  uses  Collections — which  we  would  expect — each  client  repeats  the  interface  for  its  import 
of  the  unit.  A  further  problem  is  that  this  style  asks  the  client  to  supply  the  interface  of  the  library, 
but  the  interface  of  a  library  is  usually  provided  by  the  library  author,  not  the  client.  Fortunately, 
a  combination  of  SC  and  IC  allows  us  to  use  the  system  in  a  much  cleaner  way. 

Handoff  Units 

A  programmer  who  wishes  his  code  to  be  available  for  separate  compilation  can  provide  a  handoff 
unit  which  supplies  the  interface.  Starting  from  scratch,  the  handoff  unit  contains  an  SC  import: 

unit  Collections  = 
unit 

signature  QUEUE  = 
sig 

type  ' a  queue 

val  empty  :  'a  queue 

val  push  :  ’a  *  ’a  queue  ->  'a 

val  pop  :  ’a  queue  ->  ’a  *  'a  queue 

end 
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import  Collectionslmpl  : 
intf 

structure  Queue  :  QUEUE 
end 

end 

The  implementation  of  the  collections  library  is  imported  from  the  unit  Collectionslmpl.^ 
Because  the  import  declaration  opens  the  imported  unit,  all  of  the  contents  of  Collectionslmpl 
are  available  in  the  Collections  unit.  Clients  wishing  to  make  use  of  the  library  simply  import  the 
handoff  unit  using  IC,  avoiding  the  need  to  specify  an  interface,  but  instead  sharing  the  common 
interface  provided  by  the  handoff  unit. 

unit  Schedulers  = 
unit 

import  Collections 

structure  Sched  =  (*  •  •  •  *) 
end 

This  additionally  has  the  benefit  that  the  clients  only  need  to  know  the  name  of  the  handoff 
unit,  not  the  implementation  unit.  A  few  such  clients  can  be  linked  with  the  handoff  unit: 

Lq  =  link{Collections,  Schedulers,  OtherClient) 

The  result  can  later  be  linked  with  the  implementation  of  the  Collections  library: 

Li  =  link{CollectionsImpl,  Lq) 


Definite  References 

In  the  terminology  of  Harper  and  Pierce  [11]  an  import  of  one  unit  in  another  is  interpreted  as  a 
definite  reference — that  is,  as  a  free  variable  that  refers  to  a  single,  specific  unit  through  an  interface 
for  it,  either  inferred  or  specified.  This  ensures  that  if  two  separate  units  import  a  common  unit, 
such  as  a  well-known  library,  these  units  share  a  common  understanding  of  the  abstract  types 
exported  by  that  unit. 

In  contrast,  the  fully  functorized  style  is  to  A-abstract  a  module  over  all  of  the  modules  on 
which  it  depends.  Because  functors  may,  in  principle,  be  applied  to  many  different  arguments,  its 
parameters  are  indefinite  references.  As  such  sharing  relationships  among  components  are  lost, 
because  they  need  not  be  true  in  every  instance.  To  avoid  this,  one  must  explicitly  specify  the 
intended  sharing  constraints;  this  can  be  quite  burdensome. 

The  handoff  methodology  in  SMLSC  facilitates  programming  with  definite  references.  Two 
pieces  of  code  that  import  the  same  unit  using  separate  compilation  may  only  be  linked  if  they 
import  that  unit  at  equivalent  interfaces.  The  imports  are  then  consolidated  into  a  single  import, 
ensuring  that  type  equations  hold.  When  the  same  handoff  unit  is  used  to  create  the  two  imports, 
these  interfaces  will  always  be  equivalent.  In  corner  cases  such  as  skew  between  versions  of  a 
library’s  handoff  unit,  the  programmer  may  manually  consolidate  two  imports.  We  discuss  this 
further  in  Section  6. 

unit  Collectionslmpl  also  needs  signature  QUEUE,  it  can  be  placed  in  its  own  unit  QUEUESIG  and  both 
Collectionslmpl  and  Collections  can  incrementally  compile  against  QUEUESIG.  Appendix  G  exemplifies  this  ap¬ 
proach. 
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3  Semantics  extending  the  Harper-Stone  semantics  of  ML 


In  this  section,  we  give  a  semantics  to  SMLSC  by  extending  Harper  and  Stone’s  Typed  Seman¬ 
tics  (TS)  for  Standard  ML  [13].  At  a  high  level  the  typed  semantics  consists  of  an  elaboration 
relation  from  an  external  language^  called  TSEL,  into  an  internal  language,  called  TSIL.  The  ex¬ 
ternal  language  is  a  slight  extension  of  the  abstract  syntax  of  Standard  ML.  The  internal  language 
is  a  typed  A-calculus  based  on  the  Harper-Lillibridge  type  theory  for  modules  [10].  Elaboration 
comprises  type  inference,  pattern  compilation,  equality  compilation,  identifier  resolution,  and  in¬ 
sertions  of  coercions  for  signature  matching.  The  result  of  elaboration  is  a  well-formed  program  in 
the  TSIL,  to  which  a  dynamic  semantics  is  given  to  provide  an  execution  model.  The  semantics  of 
SMLSC  is  an  extension  of  the  Harper-Stone  semantics  that  elaborates  units  into  linksets  that  can 
be  completed  for  execution. 

The  TSIL.  We  begin  with  a  brief  review  of  the  structure  of  the  TSIL.  The  TSIL  consists  of  a 
core  level  and  a  module  level.  The  core  level  includes  expressions  exp,  constructors  eon,  and  kinds 
knd.  Kinds  classify  constructors.  Constructors  of  kind  H  are  types;  they  classify  expressions.  The 
module  level  includes  modules  mod  and  signatures  sig,  which  classify  modules.  We  write  {}  to 
denote  the  empty  record,  and  mod. lab  to  denote  the  projection  of  a  component  named  lab  from  the 
structure  mod.  The  semantics  works  mainly  with  modules,  ultimately  elaborating  units  to  TSIL 
structures. 

Declaration  lists  serve  as  contexts  in  the  TSIL  static  semantics.  A  declaration  list  dees  = 
deei, . . . ,  deen  declares  expression  {var:eon),  constructor  {var:knd{=eon)),  and  module  {var:sig) 
variables.  A  structure  declaration  list  sdees  has  the  form 

labi\>deei, . . . ,  labn>deen 

associating  a  label  with  each  declaration.  The  structure  declaration  list  lab\> dee,  sdees  binds  the 
variable  declared  by  dee  with  scope  sdees.  We  write  [sdees]  to  denote  the  signature  of  a  structure 
containing  fields  described  by  sdees.  Variables  express  dependencies  between  components  in  a 
structure  signature  and  may  be  freely  alpha-varied.  Labels  name  components  for  external  reference 
and  may  not  be  renamed  without  changing  the  meaning  of  the  signature.  Consider  the  declaration 
of  a  structure  m  containing  an  opaque  type  component  T  and  value  component  X  of  that  type: 

m  :  [T\>t:kl,  X\>x:t]. 

We  can  systematically  rename  the  bound  variables  t  and  x.  A  path  is  a  module  variable  followed 
by  a  list  of  labels,  serving  a  role  similar  to  SML  long  identifiers.  The  paths  m.T  (a  constructor) 
and  m.X  (an  expression)  refer  to  m’s  components. 

A  bnd  binds  a  variable  to  an  expression  {var=exp),  constructor  {var=eon),  or  module  {var=mod). 
A  structure  binding  list  sbnds  has  the  form 

labi>bndi, . . . ,  labn>bndn. 

A  structure  is  written  [sbnds].  The  module  syntax  is  closed  under  the  formation  of  functors: 
dependently  typed  functions  from  modules  to  modules. 

We  shall  use  the  TSIL  judgements  given  in  Eigure  2.  These  judgements  have  the  following 
meaning. 

•  dees  h  sdees  ok.  No  label  is  used  twice  and  every  declaration  is  well-formed.  Eor  example, 

h  T\>t:kl,  Xt>x:t  ok. 


7 


Judgement. . . 
h  dees  ok 
dees  h  sdees  ok 
dees  h  sig  :  Sig 
dees  h  sig  =  sig’  :  Sig 
dees  h  sbnds  :  sdees 
dees  h  mod  :  sig 


Meaning. . . 

dees  is  well-formed 

sdees  is  well-formed 

sig  is  well-formed 

signature  equivalence 

sbnds  has  declaration  list  sdees 

mod  has  signature  sig 


Figure  2:  TSIL  judgements  (summary) 


Judgement. . . 
r  h  strdee  sbnds  :  sdees 
r  h  sigexp  sig  :  Sig 
r  h  spee  sdees 
r  hctx  lo,bs  ^  path  :  elass 
dees  hsub  path  :  sig^  ^  sig 


Meaning. . . 

structure  declaration  elaboration 
signature  elaboration 
specification  elaboration 
context  lookup 
mod  :  sig’ 

coercion  compilation 


Figure  3:  TS  elaboration  judgements  (summary) 


•  dees  h  sig  :  Sig.  The  signature  sig  is  well-formed. 

•  dees  h  sig  =  sig’  :  Sig.  The  signatures  sig  and  sig’  declare  the  same  components,  in  the  same 
order,  with  the  same  labels,  and  corresponding  type  components  are  equivalent. 

•  dees  h  sbnds  :  sdees.  The  structure  binding  list  sbnds  matches  the  structure  declaration  list 
sdees.  Corresponding  labels  must  agree  and  each  bound  expression,  constructor,  or  module 
in  sbnds  must  match  its  declaration  in  sdees.  For  example,  the  judgement 

dees  h  {lab\>var=mod,  sbnds)  :  {lab\> var: sig ,  sdees) 
holds  if  dees  h  mod  :  sig  and  dees,  vav.sig  h  sbnds  :  sdees. 

•  dees  h  mod  :  sig.  The  module  mod  has  signature  sig.  The  signature  sig  may  or  may  not  be 
fully  transparent.  For  example,  we  may  derive  both 

m  :  [T\>t:iJ,  X\>x:t]  h  m  :  [T\>tM=m.T,  X\>x:t] 


and 

m  :  [T\>tM,  X\>x:t]  h  m  :  [T\>t:iJ,  X\>x:t]. 

The  former  signature  is  said  to  be  selfified  with  respect  to  the  variable  m. 


TS  elaboration.  Harper  and  Stone  give  a  semantics  to  Standard  ML  by  elaboration  of  TSEL 
into  TSIL.  Elaboration  is  performed  in  a  context  F  consisting  of  a  structure  declaration  list  {sdees) 
that,  due  to  shadowing,  may  have  duplicate  labels.  We  shall  use  the  TS  elaboration  judgements 
given  in  Figure  3.  These  judgements  have  the  following  meaning. 


•  r  h  strdec  shnds  :  sdecs.  Elaborate  the  TSEL  structure  declaration  strdec  to  the  structure 
binding  list  shnds  :  sdecs.  Since  the  TSEL  permits  functors  within  structures,  this  includes 
elaboration  of  functor  declarations. 

•  Eh  sigexp  sig  :  Sig.  Elaborate  the  TSEL  signature  expression  sigexp  to  the  signature  sig. 
The  TSEL  does  not  include  signature  declarations;  we  treat  them  as  abbreviations  for  TSIL 
signatures,  recording  them  in  linksets  and  expanding  them  during  elaboration. 

•  r  h  spec  sdecs.  Elaborate  the  TSEL  specification  spec  to  the  structure  declaration  list 
sdecs.  This  includes  elaboration  of  functor  specifications. 

•  T  hctx  lahs  path  :  class.  Perform  identifier  resolution  in  the  context  F.  The  input  is  a  list  of 
labels,  which  is  derived  from  an  SML  long  identifier;  the  output  is  a  path  classified  by  the  type, 
kind,  or  signature  class.  Some  labels  in  the  context  are  annotated  with  a  star,  indicating  that 
they  are  “open”  (in  the  sense  of  the  SML  open  declaration).  Identifier  resolution  searches  T 
from  right  to  left,  descending  into  structures  with  starred  labels.  Eor  example,  we  may  derive 

T>ti:n,T>t2:n={}  h  T  ^  t2  :  fl={} 


and 

Xl>xi:{},  l*t>m:[Tt>t:^,  Xt>X2:t]  X  m.X  :  m.T. 

•  decs  hsub  :  sig^  ■<  sig  ^  mod  :  sigh  Perform  transparent  signature  ascription.  The 
inputs  are  a  signature  sigo,  a  path  having  that  signature,  and  a  target  signature  sig.  The 
output  is  a  module  mod  :  sig' ,  where  sig'  has  the  same  shape  as  sig  but  is  fully  transparent 
relative  to  path. 

Elaboration  maps  TSEL  identifiers  to  TSIL  labels  using  a  function  “.  To  implement  identifier 
“shadowing,”  elaboration  employs  a  function  shnds +]- shnds'  :  sdecs-\+ sdecs'  that  concatenates 
shnds  :  sdecs  and  shnds'  :  sdecs',  renaming  labels  in  the  left  hand  sides  that  appear  in  the  right 
hand  sides.  The  function  chooses  fresh  labels  that  do  not  correspond  to  TSEL  identifiers.  Eor 
example,  if 

shnds  :  sdecs  =  ri>ti={}  :  ri>ti:II={} 
shnds'  :  sdecs'  =  Tt>t2=lnt  :  T>t2.hl=liLt, 

then  shnds+hshnds'  :  sdecs+hsdecs'  might  be 

{lah\>ti={},T\>t2=lnt)  :  {lah\>ti:hl={},T\>t2.ht=lii.t) 
where  lah  is  not  in  the  range  of  the  ~  function. 

3.1  Linking 

We  define  linking  for  the  TSIL  by  giving  rules  for  deriving  the  linking  judgements  in  Eigure  4.  A 
linkset 

sdecsQ  shnds  :  sdecs;  S 

comprises  imports  sdecso,  exports  shnds  :  sdecs,  and  signature  abbreviations  S. 
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Judgement. . .  Meaning. . . 

dees  h  L  ok  L  is  well-formed 

dees  h  5  ok  S'  is  well-formed 

L  -w  exp  :  {}  L  completes  to  exp 

dees  h  L-\+L'  -w  L"  L  and  L'  merge  to  L" 


Figure  4:  Linking  judgements 

L  ::=  sdeesQ  sbnds  :  sdees;  S  linkset 

S  ::=  • 

S,  sigid  =  sig  top-level 

S,  unitid  =  S'  declared  by  unitid 


Figure  5:  Linkset  syntax 


•  The  imports  sdeeso  describe  the  TSIL  structures  on  which  the  linkset  depends;  they  must  be 
well- formed  in  the  ambient  context.  For  example,  the  imports 

sdeesAB=  A\>a:\T\>tM^  X\>x-.t], 

Bt>h-.\Yt>y.a.T] 

express  dependency  on  structures  labelled  A  and  B. 

Imports  specify  assumptions  to  be  satisfied  by  linking.  A  linkset  with  imports  sdeesAS 
assumes  structure  B  binds  (at  least)  a  value  Y  :  a.T  but  can  be  linked  with  (a  linkset 
exporting)  a  structure  B  providing  more  components. 

•  The  exports  sbnds  :  sdees  are  the  TSIL  code  associated  with  the  linkset.  They  may  make 
reference  to  the  linkset’s  imports.  Continuing  our  example,  the  exports 

sbndszR  '.  sdeeszR  =  Z\>z=b.Y,  R\>r=a.T  : 

Z\>z:a.T,  R\>rM=a.T 

reference  the  imports  sdees ab  to  bind  an  expression  Z  of  the  imported  type  and  an  equivalent 
type  R. 

•  The  signature  abbreviations  S  are  used  during  elaboration.  They  may  make  reference  to  the 
linkset’s  imports  and  exports.  Continuing  our  example,  the  signature  abbreviations 

S’siG  =  SlG=[M\>m:n=r] 

specify  that  elaboration  should  treat  the  signature  identifier  SIG  as  an  abbreviation  for  a 
TSIL  signature  referencing  the  exported  type  R. 

The  dynamic  semantics  for  SMLSC  is  very  simple.  The  completion  judgment  L  -w  exp  :  {} 
translates  a  linkset 

•  ^  sbnds  :  sdees]  S 

with  no  imports  to  a  TSIL  expression 

[sbnds,  lab\>var={}]. lab  :  {} 
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where  lab  and  var  are  fresh.  Under  the  TSIL  dynamic  semantics,  the  resulting  expression  evaluates 
the  linkset’s  exports  from  left  to  right  for  their  side-effects.  Evaluation  terminates  when  an  uncaught 
exception  is  raised  or  when  every  export  has  been  evaluated. 

We  give  the  full  syntax  for  linksets  in  Figure  5  and  the  rules  in  Appendix  A.  The  remainder  of 
this  section  explains  the  rules  for  linkset  merge. 

Notation.  We  write  decs,  sdecs  to  extend  a  context  decs,  implicitly  dropping  the  labels  in  sdecs. 
We  define  the  domain  of  a  structure  declaration  list,  dom(sdecs),  by 

dom(/a6il>deci, . . . ,  labn> decn)  =  {labi, . . . ,  labn}- 

We  write  {var /var'}L  for  the  capture-free  substitution  of  var  for  free  occurrences  of  var'  in  L.^ 

Linkset  merge.  The  rules  for  linkset  merge  decs  h  L1-H-T2  T3  combine  Li  and  L2  to  produce 
L3.  The  rules  presuppose  that  Li  is  well-formed  with  respect  to  decs  but  permit  L2  to  make 
reference  (via  free  TSIL  variables)  not  only  to  decs  but  to  the  imports  and  exports  of  Li.  Formally, 
the  rules  satisfy  the  following  property.^ 

If  Li  =  sdecsi  sbnds  :  sdecs 
and  decs  h  Li  ok 
and  decs,  sdecsi,  sdecs  h  L2  ok, 
and  decs  h  L1-H-L2  L3, 

then  decs  h  L3  ok. 

If  a  linkset  is  well-formed,  then  it  neither  imports  nor  exports  the  same  label  twice  (although  it 
may  both  import  and  export  a  particular  label). 

The  rules  process  the  imports  in  L2  from  left  to  right.  If  L2  has  no  imports,  then  the  following 
rule  applies. 

L  =  sdecso  sbnds  :  sdecs 

decs  h  L-H-(-  ^  sbnds'  :  sdecs') 

sdecso  sbnds+hsbnds'  :  sdecs+hsdecs' 

L3  imports  what  Li  does  and  exports  what  Li  and  L2  do.  To  ensure  that  L3  is  well-formed,  the 
rule  uses  the  TS  function  -H-  to  concatenate  the  exports  in  Li  with  the  exports  in  L2,  renaming 
labels  exported  by  Li  that  are  also  exported  by  L2. 

Otherwise,  the  rules  examine  the  first  import  lab\>var:sig  in  L2  and  distinguish  three  mutually 
exclusive  cases: 

•  Li  exports  lab. 

sdecs  =  sdecs" ,  lab\>var':sig' ,  sdecs'" 
decs,  sdecsQ,  sdecs  hsub  var':sig'  <  sig  mod:sig" 
sbnd  :=  l\>var=mod  sdec  :=  l\>var:sig" 

L  :=  sdecsQ  sbnds+hsbnd  :  sdecs+hsdec 
decs  h  L+^{sdecsi  sbnds'  :  sdecs')  L" 

decs  h  {sdecso  sbnds  :  sdecs)+{- 

{lab\>var:sig ,  sdecsi  sbnds'  :  sdecs')  L" 

^Linkset  bound  variables  and  scopes  are  discussed  in  Appendix  A. 

■^In  this  description  of  linkset  merge,  we  suppress  all  details  related  to  signature  abbreviations. 
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The  first  premise  picks  out  the  Li  export  lab\>var':sig'  for  lab]  there  can  be  at  most  one 
since  Li  is  well-formed.  The  second  premise  calls  the  TS  coercion  compiler  to  match  the 
export  var':sig'  to  the  import  signature  sig.  Linking  fails  if  no  match  is  possible;  otherwise, 
sig"  has  the  same  “shape”  as  sig,  but  is  fully  transparent  relative  to  the  variable  var' .  The 
structure  binding  sbnd  :  sdec  is  constructed  using  the  coercion  module  mod  at  the  signature 
sig" ,  maximizing  type  sharing.  The  linkset  L  has  the  same  imports  as  Li,  and  exports  those 
of  Li  plus  the  result  of  the  preceding  coercion.  To  ensure  that  L  is  well-formed — in  particular, 
that  it  exports  nothing  more  than  once — the  rule  uses  -H-  to  construct  its  exports. 

•  Li  imports  lab  but  does  not  export  it. 

lab  0  dom(sdecs) 

sdecso  =  sdecs" ,  lab\>var' :sig' ,  sdecs'" 
decs,  sdecso,  sdecs  h  sig  =  sig'  :  Sig 
L'  :=  {var' / var}{sdecsi  sbnds'  :  sdecs') 
decs  h  {sdecso  sbnds  :  sdecs)-\+L'  -w  L" 

decs  h  (sdecso  ^  sbnds  :  sdecs) -H- 
{lab\>var:sig ,  sdecsi  sbnds'  :  sdecs')  L" 

The  first  premise  ensures  Li  does  not  export  lab.  The  second  premise  picks  out  the  Li  import 
lab\>var':sig' .  Linking  fails  if  sig  and  sig'  are  not  equivalent;  otherwise,  L'  is  constructed  by 
changing  references  in  the  remainder  of  L2  to  use  the  import  in  Li. 

•  Li  neither  imports  nor  exports  lab. 

lab  0  dom(sdecs)  U  dom(sdecso) 
decs,  sdecso,  sdecs  h  sig  =  sig'  :  Sig 
decs,  sdecso  ^  sig'  :  Sig 
L  :=  sdecso,  lab\>var:sig'  sbnds  :  sdecs 
decs  h  L-H-(sdecsi  ^  sbnds'  :  sdecs')  L" 

decs  h  (sdecso  ^  sbnds  :  sdecs)-H- 
{lab\>var:sig ,  sdecsi  sbnds'  :  sdecs')  L" 

The  first  premise  ensures  that  Li  neither  imports  nor  exports  lab.  The  next  two  premises 
choose  a  signature  sig'  equivalent  to  sig  but  well-formed  without  reference  to  the  exports  of 
Li.  Linking  fails  if  no  such  signature  exists — when  opaque  types  exported  by  Li  occur  in  sig. 
Otherwise,  L  is  constructed  by  adding  a  new  import  to  the  imports  in  Li. 

3.2  Elaboration 

We  define  a  semantics  for  SMLSC  by  giving  rules  for  the  elaboration  judgements  in  Figure  6.  We 
give  the  abstract  syntax  for  SMLSC  in  Figure  7.  The  elaboration  rules  appear  in  Appendix  B. 
These  judgements  have  the  following  meaning. 

•  project  L.  Elaborate  project,  using  linkset  merge  to  accumulate  a  resulting  linkset  L.  A 
source  unit  is  elaborated  in  a  context  F  that  declares  the  imports  and  exports  in  L. 

•  F  h  srcunit  L.  Elaborate  the  topdec  in  srcunit  to  the  linkset 

sdecso  sbnds  :  sdecs;  S. 
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Judgement. . . 
projeet  L 
r  h  sreunit  L 
r  h  topdee  L 
r  h  impexp  L 
r  h  sigbind  ^  S 


Meaning. . . 
project  elaboration 
unit  elaboration 

top-level  declaration  elaboration 
import  expression  elaboration 
signature  binding  elaboration 


r  hctx  sigid  sig  :  Sig  signature  lookup 
r  hctx  unitid  S 

r  ok  r  is  we  11- formed 


Figure  6:  Elaboration  judgements 


projeet 


sreunit 

topdee 


impexp 

sigbind 


empty 

projeet,  sreunit  source  unit 

projeet,  L  compiled  unit (s) 

unit  unitid  =  topdee  unit  declaration 

import  impexp  open  units 

strdee 

signature  sigbind 

local  topdeei  in  topdee2  end 

topdeei  topdee2 

unitid  {:  intf  spee  end)  open  unitid 

impexp  I  impexp  2 

sigid  =  sigexp  (and  sigbind) 


Figure  7:  SMLSC  abstract  syntax 
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The  imports  sdecsQ  arise  from  the  import  declarations  in  topdec.  The  exports  sbnds  :  sdecs 
arise  from  the  structure  declarations  in  topdec.  The  signature  abbreviations  S  arise  from  the 
signature  declarations  in  topdec.  The  result,  L,  exports  a  single  module 

unitid\>var=[sbnds\  :  unitid\>var:[sdecs\. 

•  T  h  topdec  L.  Elaborate  topdec  using  linkset  merge  and  identifier  resolution. 

•  T  h  impexp  L.  Elaborate  impexp  using  identifier  resolution  and  spec  elaboration. 

•  T  h  sigbind  S.  Elaborate  sigbind  using  signature  elaboration. 


4  Semantics  extending  The  Definition  of  Standard  ML 

In  this  section,  we  give  a  semantics  to  SMLSC  by  extending  The  Definition  of  Standard  ML  (TD)  [16]. 
TD  gives  a  semantics  to  SML  by  relating  it  to  semantic  objects — mathematical  sets,  functions,  and 
so  on.  We  refer  to  these  semantic  objects  collectively  as  the  internal  language  (TDIL)  and  to 
SML  as  the  external  language  (TDEL).  The  TDIL  is  partitioned  into  static  and  dynamic  semantic 
objects.  TD’s  static  semantics  specifies  type  checking  and  type  inference  using  the  static  TDIL. 
TD’s  dynamic  semantics  gives  the  TDEL  a  big-step,  call-by- value  operational  semantics  using  the 
dynamic  TDIL. 

A  unit  declaration  list  udecs  is  a  list  of  source  units  srcuniti, . . . ,  srcunitn  (see  Figure  8).  In 
Section  4.1  we  review  TD’s  static  semantics  and  extend  it  to  unit  declaration  lists.  In  Section  4.2  we 
do  the  same  for  TD’s  dynamic  semantics.  We  define  linksets  and  linking  in  Section  4.3.  A  linkset 
contains  source  code — a  unit  declaration  list — and  static  TDIL:  Separate  compilation  corresponds 
to  separate  type  checking.  In  Section  4.4  we  give  a  semantics  to  SMLSC  through  an  elaboration 
into  linksets. 

4.1  Static  Semantics 

We  shall  use  the  TD  static  semantic  judgements  given  in  Figure  9.  These  judgements  have  the 
following  meaning. 

•  B  strdec  ^  E.  The  structure  declaration  strdec  is  well-typed  and  declares  the  structure, 
type,  and  value  identifiers  in  environment  E. 

•  B  \-  sigdec  ^  G.  The  signature  declaration  sigdec  is  well-formed  and  declares  the  signature 
identifiers  in  signature  environment  G. 

•  B  \-  fundee  ^  F.  The  functor  declaration  fundee  is  well-typed  and  declares  the  functor 
identifiers  in  functor  environment  F. 

•  B  \-  sigexp  S.  The  signature  expression  sigexp  is  well- formed  and  specifies  the  components 
in  signature  S. 


udecs  ::=  •  empty 

udecs ,  srcunit  unit  declaration 


Figure  8:  Unit  syntax 
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Judgement. . . 

B  h  strdee  ^  E 
B  h  sigdee  ^  G 
B  h  fundee  F 
B  h  sigexp  S 
B  h  spee  E 


Meaning. . . 

structure  declaration  elaboration 
signature  declaration  elaboration 
functor  declaration  elaboration 
signature  expression  elaboration 
specification  elaboration 


T,  >  E  using  (/?  signature  instantiation 

El  >-  E2  enrichment 


Figure  9:  TD’s  static  semantic  judgements  (summary) 


•  B  \-  spee  E.  The  specification  spee  is  well-formed  and  specifies  the  components  in  E. 

•  S  >  Fi  using  ip.  The  environment  E  is  an  instance  of  the  signature  S  using  the  realization  (p. 

•  El  y  E2.  The  environment  Ei  may  have  more  components  than  E2,  it  may  be  less  polymor¬ 
phic,  and  it  may  change  the  status  of  constructors/exceptions  to  values. 

One  subtlety  in  these  judgements  pervades  our  semantics:  They  account  for  TDEL  type  sharing 
by  stamping  TDIL  types  with  type  names  and  TDEL  type  generativity  by  using  state-passing  to 
track  the  set  of  type  names  that  “have  been  generated” .  Two  TDIL  types  share  if  they  are  stamped 
with  the  same  type  name.  To  see  how  state-passing  works,  consider  the  judgement 

B  h  strdee  ^  E. 

The  basis  B  =  T,F,  G,  E'  comprises  a  context  and  a  state.  The  context  F,  G,  E'  assigns  static 
TDIL  to  those  identifiers  that  may  occur  free  in  strdee.  The  state  T  is  a  set  of  type  names.  Rules 
that  generate  types  (e.g.,  the  rule  for  datatype  declarations)  choose  type  names  not  in  T:  Types 
stamped  with  such  names  do  not  share  with  any  types  in  B. 

TDEL  signature  matching  complicates  the  tracking  of  type  names.  The  rule  for  opaque  signa¬ 
ture  ascription  strexp  :>  sigexp  generates  types  after  elaborating  strexp  and  sigexp.  Consider  the 
following  structure  declaration. 

structure  A  =  struct  type  a  =  int  type  b  =  a  end 
:>  sig  type  a  type  b  =  a  end 

Types  A. a  and  A.b  share  but  neither  shares  with  int.  At  the  level  of  the  TDIL,  A. a  must  be 
stamped  with  a  new  type  name  and  A.b  must  be  stamped  with  the  same  type  name.  To  handle 
the  book-keeping,  a  TDIL  signature  S  =  {T)E  comprises  a  set  T  of  bound  type  names  (induced 
by  abstract  type  specifications)  and  an  environment  E  describing  its  components.  The  example 
elaborates  as  follows. 

1.  The  inner  structure  expression  “struct  type  a  =  int  type  b  =  a  end”  elaborates  to  an 
environment  E  mapping  type  constructors  a  and  b  to  types  stamped  with  tint  (where  tint  is 
the  type  name  associated  with  int  in  the  ambient  basis). 

2.  The  signature  expression  elaborates  to  a  signature  S  =  (T)E'  where  T  =  {t}  binds  one  type 
name  and  E'  maps  the  type  constructors  a  and  b  to  types  stamped  with  t. 
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uspecs 

unitid:Ti, . . . ,  unitid:Tn 
Tor  {T){F,G,E) 

T  OT  B,U 

U 

unitid 


G  UnitSpecs  =  |J^>Q  UnitSpecs'^ 

G  UnitSpecs"'  =  (iTnitld  x  UnitSig)"' 

G  UnitSig  =  TyNameSet  x  (FunEnv  x  SigEnv  x  Env) 
G  UnitBasis  =  Basis  x  UnitEnv 

G  UnitEnv  =  Unitid  Basis 
G  Unitid  (unit  identifiers) 


Eigure  10:  Static  TDIL  for  unit  declaration  lists 


3.  The  opaque  ascription  (a  structure  expression)  elaborates  to  E'  after  the  bound  type  name  in 
S  is  systematically  renamed  so  that  it  differs  from  all  type  names  in  the  ambient  basis  (e.g., 

t  /  tint)- 

4.  The  structure  declaration  elaborates  to  an  environment  Ea  mapping  the  structure  identifier 
A  to  the  environment  E'  obtained  in  (3).  Thus  the  types  A. a  and  A.b  in  Ea  are  stamped 
with  a  fresh  type  name  as  required. 

The  rule  for  transparent  signature  ascription  strexp  :  sigexp  induces  sharing  after  elaborating  strexp 
and  sigexp.  Consider  the  following  structure  declaration. 

structure  B  =  struct  type  a  =  int  type  b  =  a  end 
:  sig  type  b  end 

Types  B.b  and  int  share.  To  induce  sharing,  TD  uses  capture-avoiding  substitution  from  type 
names  to  types:  The  rule  chooses  and  applies  a  realization  ip.  This  example  elaborates  as  follows. 

1.  The  inner  structure  expression  elaborates  as  in  the  preceding  example. 

2.  The  signature  expression  elaborates  to  a  signature  T,'  =  {T')E"  where  T'  =  {t'}  binds  one 
type  name  and  E"  maps  the  type  constructor  b  to  a  type  stamped  with  t' . 

3.  The  transparent  ascription  elaborates  to  ‘p{E")  where  ip{-)  can  be  applied  to  any  semantic 
object  A  to  substitute  tint  (the  type  name  associated  with  b  in  E)  for  free  occurrences  of  t' 
in  A. 

4.  The  structure  declaration  elaborates  to  an  environment  Eb  mapping  the  structure  identifier 
B  to  the  environment  ip{E").  Thus  the  type  B.a  in  Eb  is  stamped  with  tint  as  required. 

Both  ascription  rules  employ  the  signature  instantiation  and  enrichment  relations  to  define  TDEL 
signature  matching  in  terms  of  TDIL  environments  and  signatures. 

Unit  Static  Semantics  In  Figure  10  we  extend  the  static  TDIL  for  unit  declaration  lists.  These 
TDIL  categories — disjoint  from  all  others — build  on  TD’s  categories  TyNameSet,  FunEnv,  SigEnv, 
Env,  and  Basis  (see  Appendix  C).  When  specifying  TDIL  we  use  the  following  notation. 

•  Ax  B  denotes  the  cartesian  product  of  A  and  B. 

•  Au  B  denotes  the  disjoint  union  of  A  and  B. 

•  A^  B  denotes  the  set  of  partial  functions  from  A  to  B  with  finite  domain. 
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Judgement. . . 

Meaning. . . 

F  h  udecs  :  uspecs 

F  h  topdec  :  B 

F  h  impexp  :  E,G,E 

udecs  has  specification  list  uspecs 

topdec  has  basis  B 

impexp  has  components  E,  G,  E 

B  h  intexp  T 

B  h  topspec  ^  E,E 

B  h  funspec  F 

interface  expression  elaboration 
top-level  specification  elaboration 
functor  specification  elaboration 

F  ok 

F  h  T  :  Sig 

F  h  uspecs  ok 

F  is  well-formed 

T  is  well-formed 
uspecs  is  well-formed 

E  :  S  using  (p 

S  =  S'  using  (f 

T  =  T'  using  cp 
F,G,E  :  F  using  p> 

signature  matching 
signature  equivalence 
interface  equivalence 
interface  matching 

Figure  11: 

Unit  static  semantics 

•  A’^  denotes  a  sequence  of  length  n  >  0  whose  range  is  a  subset  of  A. 

A  unit  signature  (or  interface)  T  =  {T)(F,G,  E)  describes  a  unit.  T  specifies  the  components 
in  environments  F,  G,  and  E,  binding  type  names  T  with  scope  {E,G,E). 

A  unit  specification  list  unitid i:T i, unitidn-Fn  describes  a  unit  declaration  list.  Writ¬ 
ing  BT(T)  for  the  type  names  bound  in  T,  a  unit  specification  list  binds  BT(Tj)  with  scope 
unitidi+i'.Ti^i, . . . ,  unitidn'-F n  for  each  1  <  i  <  n. 

A  unit  basis  T  =  B,U  (where  B  =  T,E,  G,  E)  comprises  a  state  T  and  a  context  F,  G,  E,  U. 
The  unit  environment  17  is  a  finite  map  from  unit  identifiers  to  bases:  If  U {unitid)  =  T\F\G\E' , 
then  T'  C  T  records  the  type  names  generated  by  unitid  and  the  environments  E\G\E'  describe 
its  components. 

We  give  a  static  semantics  to  unit  declaration  lists  by  giving  rules  for  the  judgements  in  Fig¬ 
ure  11.  The  rules  appear  in  Appendix  C.  These  judgements  have  the  following  meaning. 

•  T  h  udecs  :  uspecs.  The  unit  declaration  list  udecs  matches  the  unit  specification  list  uspecs. 
Corresponding  unit  identifiers  must  agree  and  each  source  unit  in  udecs  must  match  its 
specification  in  uspecs.  For  example,  the  judgement 

F  h  (unit  unitid  =  unit  topdec  end,  udecs)  :  {unitid:{T){F,G,  E),  uspecs) 
holds  if  F  h  topdec  :  T,  F,  G,  E  and  F  -|-  T  -|-  {unitid  ^  T,  E,  G,  E}  h  udecs  :  uspecs.^ 

•  F  h  topdec  :  B.  The  top-level  declaration  topdec  has  basis  B  =  T,  E,  G,  E:  It  generates  type 
names  T  and  declares  the  components  in  E,  G,  E. 

•  F  h  impexp  :  E,  G,  E.  The  import  expression  impexp  imports  the  components  in  F,  G,  E  from 

F. 

®The  notation  F  -|-  T  -|-  {unitid  T,  F,  G,  E}  extends  the  state  then  the  context  in  F  (see  Appendix  C). 
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•  B  \-  intexp  ^  T.  The  interface  expression  intexp  specifies  the  components  in  T.  Since 
interface  expressions  do  not  describe  signature  declarations,  the  signature  environment  G  in 
T  must  be  empty. 

•  B  \-  topspec  ^  F,  E.  The  top-level  specification  topspec  specifies  the  components  in  F,  E.  No 
identifier  may  be  specified  twice. 

•  B  \-  funspec  F.  The  functor  specification  funspec  specifies  the  components  in  F.  No 
identifier  may  be  specified  twice. 

•  T  ok.  The  unit  basis  T  =  (T,  F,G,  E),U  is  well-formed:  T  contains  the  free  type  names  in 
F,  G,  E,  U. 

•  T  h  T  :  Sig.  The  interface  T  =  (T)(F,G,  E)  is  well- formed:  No  type  name  t  gT  occurs  in 
T’s  state  T'  and  T  UT'  contains  the  free  type  names  in  F,  G,  E. 

•  T  h  uspecs  ok.  The  unit  specification  list  uspecs  is  well-formed.  For  example,  the  judgement 

T  h  unitid:T,  uspecs  ok 

holds  if  T  h  T  :  Sig  and  T  -|-  BT(T)  h  uspecs  ok. 

•  E  :  T,  using  ip.  The  environment  E  matches  the  signature  S  using  the  realization  ip. 

•  S  =  S'  using  ip.  The  signatures  S  =  (T)E  and  S'  =  (T')E'  specify  the  same  components  and 
F  =  ip{E'). 

•  T  =  T'  using  ip.  The  interfaces  T  =  {T){F,G,  E)  and  T'  =  {T'){F' ,G' ,  E')  specify  the  same 
components  and  E,G,E  =  ip{F' ,  G' ,  E') . 

•  F,  G,  F  :  T  using  ip.  The  environments  F,G,E  match  the  interface  T  and  the  realization  ip 
induces  the  requisite  sharing:  If  the  semantic  object  A  refers  to  type  names  bound  in  T,  then 
the  semantic  object  ip{A)  refers  to  corresponding  types  employed  by  F,  G,  E. 

4.2  Dynamic  Semantics 

We  shall  use  the  TD  dynamic  semantic  judgements  given  in  Figure  9.®  These  judgements  have  the 

following  meaning. 

®In  many  cases,  static  and  dynamic  TDIL  categories  have  the  same  names  and  employ  the  same  metavariables. 

No  confusion  can  result  since  the  static  and  dynamic  semantics  are  separate. 


Judgement. . . 
s,  B  \-  strdec  E,  s' 
B  h  fundee  ^  F 
IB  h  sigdec  ^  G 
IB  h  sigexp  ^  I 
IB  h  spec  I 


Meaning. . . 

structure  declaration  evaluation 
functor  declaration  evaluation 
signature  declaration  elaboration 
signature  elaboration 
specification  elaboration 


Inter  B  =  IB  interface  basis  extraction 

F  J,  I  =  F'  signature  ascription 


Figure  12:  TD’s  dynamic  semantic  judgements  (summary) 
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•  s,  B  \-  strdec  E/p^  s' .  In  state  s  and  context  B,  the  structure  declaration  strdec  evaluates 
to  state  s'  and  either  an  environment  E  or  an  exception  packet  p. 

•  B  \-  fundee  ^  F.  The  functor  declaration  fundee  evaluates  to  the  functor  environment  F. 
State  is  not  involved:  F  maps  functor  identifiers  to  functor  closures. 

•  IB  h  sigdee  ^  G.  The  signature  declaration  sigdee  declares  the  signature  identifiers  in 
signature  environment  G. 

•  IB  h  sigexp  I.  The  signature  expression  sigexp  specifies  the  components  in  structure 
interface  I. 

•  IB  h  spee  I.  The  specification  spee  specifies  the  components  in  I. 

•  Inter  B  =  IB.  The  interface  basis  IB  is  the  value-free  part  of  the  basis  B.  (Values  are  not 
needed  by  the  signature  elaboration  judgements.) 

•  E  I  I  =  E' .  The  environment  E'  is  the  environment  E  cut  down  to  match  the  structure 
interface  I. 


The  dynamic  semantics  is  defined  for  mostly  type-erased  TDEL.  Types,  type  ascriptions,  and 
type  qualifications  are  erased.  Signatures  are  not  erased.  Both  signature  ascription  and  functor 
application  limit  the  “view”  of  a  structure  in  case  it  is  opened.  Consider  the  following  declarations. 

structure  A  =  struct  val  x  =  1  val  y  =  2  end 

:  sig  val  x  :  int  end 

val  y  =  3 
open  A 

The  value  of  y  is  3  not  2.  A  related  example  employs  functors: 

functor  F(S  :  sig  val  x  :  int  end)  = 
struct 

val  y  =  3 
open  S 
end 

structure  B  =  F( struct  val  x  =  1  val  y  =  2  end) 

The  value  of  B.y  is  3  not  2.  The  dynamic  semantics  uses  structure  interfaces  to  cut  down  en¬ 
vironments  when  evaluating  signature  ascriptions  and  functor  applications.  To  obtain  structure 
interfaces,  the  dynamic  semantics  re-elaborates  signatures.  This  “dynamic  elaboration”  tracks  only 
the  status  of  identifiers,  making  it  simpler  than  the  elaboration  performed  by  the  static  semantics. 
For  example,  it  is  stateless. 

In  the  evaluation  judgement 

s,  B  h  strdee  ^  E/p,  s' , 

the  states  s  and  s'  track  a  set  of  generated  exeeption  names  and  a  memory  graph  for  references. 
The  dynamic  semantics  accounts  for  the  generativity  of  exception  bindings  by  stamping  exception 
values  with  exception  names — two  TDIL  exceptions  are  equal  if  they  are  stamped  with  the  same 
name.  The  rules  propogate  raised  exceptions  explicitly,  referring  to  them  as  exception  packets  p. 
(Compound  metavariables  like  E/p  range  over  the  disjoint  union  of  two  TDIL  categories.) 
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UI  or  FI,  I  G 

FI  G 
r  G 

U  G 


UnitInt  =  FunIntEnv  x  Int 

FunIntEnv  =  Eunid  Int  x  Int 
UnitBasis  =  Basis  x  UnitEnv 

UnitEnv  =  Unitid  ^  Basis 


Figure  13:  Unit  dynamic  semantic  objects 


Unit  Dynamic  Semantics.  In  Figure  13  we  extend  the  dynamic  TDIL  for  unit  declaration  lists. 
These  TDIL  categories — disjoint  from  all  others — build  on  TD’s  dynamic  TDIL  categories  Basis 
and  Int  (see  Appendix  D). 

As  with  TDEL  signatures,  interface  expressions  are  not  erased  prior  to  evaluation.  An  SC 
import  declaration  import  unitid  :  intexp  is  analagous  to  a  TDEL  signature  ascription  and  an 
open  declaration:  It  limits  the  “view”  of  the  imported  unit.  We  shall  give  a  dynamic  semantics 
that  uses  (dynamic)  unit  interfaces  to  cut  down  bases  when  elaborating  SC  imports. 

A  unit  interface  UI  =  FI,  I  describes  a  unit.  The  structure  interface  I  describes  structure,  type, 
and  value  components.  The  functor  interface  environment  FI  describes  functor  components  using 
functor  interfaces.  A  functor  interface  I,  I'  comprises  argument  and  result  structure  interfaces. 
Both  are  necessary.  Consider  the  following  unit  declaration  list. 

unit  UI  = 
unit 

functor  F(S  :  sig  end)  =  struct  open  S  val  x  =  1  val  y  =  2  end 
end, 

unit  U2  = 
unit 

import  UI  : 
intf 

functor  F(S  :  sig  val  x  :  int  end)  :  sig  val  x  :  int  end 
end 

structure  A  =  F (struct  val  x  =  3  end) 
val  y  =  4 
open  A 
end 

The  values  of  x  and  y  in  U2  are,  respectively,  1  and  4  rather  than  3  and  2. 

A  unit  basis  T  =  B,U  serves  as  an  evaluation  context.  The  unit  environment  U  is  a  finite  map 
from  unit  identifiers  to  bases:  If  U {unitid)  =  F,G,E,  then  the  environments  F,G,E  record  the 
values  obtained  by  evaluating  unitid. 

We  give  a  dynamic  semantics  to  unit  declaration  lists  by  giving  rules  for  the  evaluation  judge¬ 
ments  in  Figure  14.  The  rules  appear  in  Appendix  D.  These  judgements  have  the  following 
meaning. 

•  s,r  h  udecs  ^  T'/p,  s'.  Evaluate  the  unit  declaration  list  udecs  to  a  unit  basis  T'  or  an 
exception  packet  p. 

•  s,  T  h  topdec  ^  B/p,  s' .  Evaluate  the  top-level  declaration  topdec  to  a  basis  B  or  an  exception 
packet  p. 
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Judgement. . . 
s,  r  h  udees  T'/p,  s' 
s,r  h  topdee  ^  B/p,  s' 
r  h  impexp  ^  B 
IB  h  topspee  ^  UI 
IB  h  funspee  FI 


Meaning. . . 
unit  evaluation 

top-level  declaration  evaluation 
import  expression  evaluation 
top-level  specification  elaboration 
functor  specfication  elaboration 


Figure  14:  Unit  evaluation  judgements 


•  r  h  impexp  ^  B.  Evaluate  the  import  expression  impexp  to  the  basis  B. 

•  IB  \-  topspee  UI.  The  top-level  specification  topspee  specifies  the  components  in  the  unit 
interface  UI. 

•  IB  h  funspee  FI.  The  functor  specification  funspee  specifies  the  components  in  the  functor 
interface  environment  FI. 

4.3  Linking 

We  define  linking  for  SMLSC  by  giving  rules  for  deriving  the  judgements  in  Figure  15.  A  linkset 

uspeesQ  exps 

comprises  imports  uspees^  and  exports  exps.  Exports  may  take  two  forms — a  unit  declaration  list 

udees  :  uspees  or  a  static  TDIL  basis  B. 

•  The  imports  uspees^  describe  the  units  on  which  the  linkset  depends;  they  must  be  well- 
formed  in  the  ambient  context  and  no  unit  may  be  described  twice.  For  example,  the  imports 

uspees =  A:T A,  B:T B 
express  dependency  on  units  A  and  B. 

Imports  specify  assumptions  to  be  satisfied  by  linking.  A  linkset  with  imports  uspees ab 
assumes  unit  B  declares  (at  least)  the  components  described  by  the  interface  T b  but  can  be 
linked  with  (a  linkset  exporting)  a  unit  B  providing  more  components. 

•  The  exports  exps  =  udees  :  uspees  are  the  code  associated  with  the  linkset.  They  may  make 
reference  to  the  linkset ’s  imports  (via  free  unit  identifiers  and  type  names). 


Judgement. . . 
rTTok 
T  h  exps  ok 
L  ^  udees 
T  h  Lj+L'  =>  L" 

T  h  exps+Jexps'  ^  exps" 


Meaning. . . 

L  is  well-formed 
exps  is  well-formed 
L  completes  to  udees 
L  and  L'  merge  to  L" 
exps  and  exps'  merge  to  exps" 


Figure  15:  Linking  judgements 
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L  ::=  uspecs  exps  linkset 

exps  ::=  udecs  :  uspecs  units 

B  top-level  components 


Figure  16:  Linkset  syntax 


•  The  exports  exps  =  B  arise  during  elaboration  and  record  typing  information  for  the  top-level 
declaration  associated  with  the  linkset.  They  may  make  reference  to  the  linkset ’s  imports 
(via  free  type  names). 


The  dynamic  semantics  for  SMLSC  is  very  simple.  The  completion  judgment  L  ^  udecs 
translates  a  linkset 


udecs  :  uspecs 


with  no  imports  to  the  unit  declaration  list  udecs.  Under  the  dynamic  semantics  for  units  given 
in  Section  4.2,  the  resulting  unit  declaration  list  evaluates  the  linkset’s  exports  from  left  to  right 
for  their  side-effects.^  Evaluation  terminates  when  an  uncaught  exception  is  raised  or  when  every 
export  has  been  evaluated. 

We  give  the  full  syntax  for  linksets  in  Figure  16  and  the  rules  in  Appendix  E.  The  remainder 
of  this  section  explains  the  rules  for  linkset  merge. 


Notation.  We  define  the  extension  of  a  unit  basis  by  a  unit  specification  list,  T  -|-  uspecs.,  by® 

r  +  •  =  r 

T  +  {unitid:{T){F,G,E),  uspecs)  =  {T  +  T  +  {unitid  T,  F,G,  E})  +  uspecs. 

We  define  the  type  names  bound  by  a  unit  specification  list,  BT (uspecs),  by 

BT{unitidi:Ti, . . . ,  unitidn'.F n)  =  BT(Ti)  U  •  •  •  U  BT(T„). 

We  define  the  domain  of  a  unit  specification  list,  dom(uspecs),  by 

dom.(unitidi:Ti, . . . ,  unitidn'.F n)  =  {unitidi, . . . ,  unitidn} 

and  the  domain  of  a  linkset’s  exports,  dom(exps),  by 

dom(udecs  :  uspecs)  =  dom(uspecs) 
dom(B)  =  0. 

Linkset  merge.  The  rules  for  linkset  merge  T  h  L1-H-L2  ^  L3  combine  Li  and  L2  to  produce 
L3.  The  rules  presuppose  that  Li  is  well- formed  with  respect  to  T  but  permit  L2  to  make  reference 
not  only  to  T  but  to  the  imports  and  exports  of  Li. 

The  rules  process  the  imports  in  L2  from  left  to  right.  If  L2  has  no  imports,  then  the  following 
rule  applies. 

T  h  exps+\-exps'  ^  exps" 

T  h  (uspecsQ  exps)Fi-(-  exps')  ^  uspecs^  exps" 

^The  unit  declaration  list  udecs  obtained  by  completion  may  be  evaluated  in  the  initial  state  so  and  the  unit  basis 
To  =  Bo,  {},  where  so  and  the  initial  dynamic  basis  Bo  are  given  in  TD  [16,  Appendix  D]. 

®Please  see  Appendix  C  for  a  summary  of  TDIL  notation,  including  definitions  of  F  -|-  T,  F  +  17,  and  r{unitid). 
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Ls  imports  what  Li  does.  The  rules  for  T  h  exps-\+exps'  exps"  ensure  that  L3  exports  what  Li 
and  L2  do. 

Otherwise,  the  rules  examine  the  first  import  unitid:T  in  L2  and  distinguish  three  mutually 
exclusive  cases: 

•  Li  exports  unitid. 

unitid  G  dom{uspecs) 

(r  +  uspecsQ  +  uspecs) (unitid)  =  T',  F' ,  G' ,  E' 

F',G',E'  :  T  using  ip 
V  :=  ip(uspecsi  udecs'  :  uspecs') 
r  h  (uspecs Q  udecs  :  uspecs) Ft- L'  ^  L" 

r  h  (uspecsQ  udecs  :  uspecs) Ft- (unitid :T ,  uspecs^  udecs'  :  uspecs')  ^  L" 

The  first  premise  ensures  Li  exports  unitid.  The  second  premise  picks  out  the  Li  export 
unitid  :  (T')(F' ,G' ,E')  for  unitid.  The  third  premise  matches  the  exported  environments 
E',G',E'  to  the  imported  interface  T.  Linking  fails  if  no  match  is  possible;  otherwise,  L'  is 
constructed  by  changing  references  to  the  type  names  bound  by  T  in  the  remainder  of  L2  to 
the  types  employed  by  Li. 

•  Li  imports  unitid  but  does  not  export  it. 

unitid  0  dom(exps) 
uspecsQ  =  uspecs" ,  unitid:T' ,  uspecs'" 

T'  =  T  using  if 
L'  :=  p(uspecsi  exps') 

T  h  (uspecs Q  exps)Fi-L'  ^  L" 

T  h  (uspecsQ  exp s) Ft- (unitid :T ,  uspecsi  exps')  ^  L" 

The  first  premise  ensures  Li  does  not  export  unitid.  The  second  premise  picks  out  the  Li 
import  unitid:T']  there  can  be  at  most  once  since  Li  is  well-formed.  Linking  fails  if  T  and 
T'  are  not  equivalent;  otherwise,  L'  is  constructed  by  changing  references  to  the  type  names 
bound  by  T  in  the  remainder  of  L2  to  the  type  names  bound  by  T'. 

•  Li  neither  imports  nor  exports  unitid. 

unitid  0  dom(exps)  U  dom(uspecso) 

T'  =  T  using  ip 
T  -|-  BT(uspecso)  b  :  Sig 
L  :=  uspecsQ,  unitid:T'  exps 
L'  :=  p(uspecsi  exps') 

T  h  Ld+L'  =P  L" 

T  h  (uspecsQ  exp s) Ft- (unitid :T ,  uspecsi  exps')  ^  L" 

The  first  premise  ensures  that  Li  neither  imports  nor  exports  unitid.  The  next  two  premises 
choose  an  interface  T'  equivalent  to  T  but  well-formed  without  reference  to  the  exports  of 
Li.  Linking  fails  if  no  such  interface  exists — when  type  names  exported  by  Li  occur  in  T. 
Otherwise,  L  is  constructed  by  adding  a  new  import  to  the  imports  in  Li  and  L'  is  constructed 
by  changing  references  to  the  type  names  bound  in  T  in  the  remainder  of  L2  to  the  type  names 
bound  in  T'. 
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Judgement. . . 
projeet  ^  L 
r  h  sreunit  L 
r  h  topdee  ^  L 
r  h  impexp  ^  L 


Meaning. . . 
project  elaboration 
unit  elaboration 

top-level  declaration  elaboration 
import  expression  elaboration 


Figure  17:  Elaboration  judgements 


4.4  Elaboration 

We  define  a  semantics  for  SMLSC  by  giving  rules  for  the  elaboration  judgements  in  Figure  17.  We 
give  the  abstract  syntax  for  SMLSC  in  Figure  18.  The  elaboration  rules  appear  in  Appendix  F. 
These  judgements  have  the  following  meaning. 

•  projeet  ^  L.  Elaborate  projeet,  using  linkset  merge  to  accumulate  a  resulting  linkset  L.  A 
source  unit  is  elaborated  in  a  unit  basis  T  that  declares  the  imports  and  exports  in  L. 

•  T  h  sreunit  ^  L.  Elaborate  the  topdee  in  sreunit  to  the  linkset 

udeesQ  T,  F,  G,  E. 

The  imports  udeesQ  arise  from  the  import  declarations  in  topdee.  The  type  names  T  arise  from 
the  types  generated  by  topdee.  The  environments  F,  G,  and  E  arise  from  the  declarations  in 
topdee.  The  result,  L,  exports  a  single  unit: 

L  =  udeesQ  sreunit  :  {T){F,  G,  E). 

•  T  h  topdee  ^  L.  Elaborate  topdee  using  linkset  merge. 

•  T  h  impexp  L.  Elaborate  impexp  using  context  lookup  and  intexp  elaboration. 


5  Implementation 

The  semantics  of  SMLSC  avoids  commitment  to  the  meaning  of  “compilation,”  “linking,”  and 
“completion”  to  ensure  compatibility  with  various  implementation  strategies.  These  phases  may 
be  implemented  using  classical  methods  (code  generation  during  compilation,  object  code  weaving 
during  linking,  and  writing  an  executable  for  completion),  or  in  other,  more  novel,  ways  (such  as 
type  checking  during  compilation,  and  code  generation  during  linking).  The  design  is,  as  far  as  we 
know,  implementable  in  all  current  Standard  ML  compilers  without  requiring  radical  changes  to 
their  infrastructure. 


projeet  ::=  •  empty 

projeet,  sreunit  source  unit 
projeet,  L  compiled  unit(s) 


Figure  18:  SMLSC  abstract  syntax 
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Parallel  Build.  A  compiler  can  exploit  interfaces  to  support  parallel  compilation  in  order  to 
speed  up  system  build  times.  A  unit  can  be  compiled  once  interfaces  have  been  inferred  for  its 
IC  imports.  The  TILT  compiler,  which  implements  an  earlier  version  of  the  present  extension, 
implements  such  a  strategy.  Moreover,  it  also  implements  cut-off  incremental  recompilation  [1], 
where  it  is  able  to  interrupt  the  normal  cascade  of  recompilation  when  a  source  change  does  not 
cause  a  unit’s  interface  to  change. 

Parsing.  This  presentation  of  SMLSC  provides  concrete  and  abstract  syntax,  but  does  not  for¬ 
malize  parsing.  The  only  issue  that  entangles  separate  compilation  and  parsing  is  fixity  declarations. 
To  support  fixity  declarations  at  parse-time,  we  include  a  parsing  context  in  the  concrete  repre¬ 
sentation  of  linksets  (object  files).  A  source  unit  that  is  incrementally  compiled  against  a  linkset 
is  parsed  using  that  linkset ’s  included  parsing  context.  We  do  not  permit  fixity  specifications  in 
user-specified  interfaces,  and  therefore  they  do  not  affect  interface  matching  or  any  other  part  of 
the  semantics. 

Note  that  a  library  may  specify  fixity  information  by  placing  appropriate  declarations  in  the 
handoff  unit.  For  example,  to  describe  a  matrix  library  that  supplies  an  infix  **  operator  for 
multiplication,  we  may  write  the  following  handoff  unit: 

unit  Matrices  = 
unit 

import  Matriceslmpl  : 
intf 

type  matrix 

val  **  :  matrix  *  matrix  ->  matrix 
(*  ...  *) 
end 

infix  ** 
end 


6  Multiple  Interfaces  for  the  Same  Import 

In  Section  2  we  presented  the  programming  methodology  of  handoff  units.  As  long  as  two  linksets 
that  import  the  same  unit  identifier  do  so  by  using  the  same  handoff  unit,  they  will  always  agree 
on  the  interface  for  that  unit  and  so  can  be  linked  together.  However,  in  some  situations  it  may 
be  useful  to  permit  two  clients  to  import  the  same  unit,  each  with  a  different  interface.  Since 
interface  matching,  like  signature  matching,  is  coercive,  this  complicates  the  methodology  of  definite 
references  by  introducing  “views”  of  the  same  underlying  unit. 

For  example,  suppose  that  two  linksets  Li  and  L2  import  the  same  unit  MathLib  at  disparate 
interfaces  Ii  and  l2-  This  may  happen  because  the  developers  of  Li  and  L2  compiled  using  different 
versions  of  the  handoff  unit  for  MathLib,  or  because  the  developers  wrote  their  import  interfaces 
by  hand.  The  link 

link{Li,  L2) 

fails  because  the  linksets  are  required  to  agree  on  the  interfaces  of  their  common  imports.  Aside 
from  recompiling  the  two  linksets  to  use  the  same  interface,  the  programmer  has  several  options 
for  resolving  this  situation.  First,  she  can  satisfy  the  imports  by  providing  the  implementation  of 
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Figure  19:  The  linkset  Lgiue  imports  MathLib  at  interface  I  and  then  exports  it  to  satisfy  the 
imports  in  Li  and  L2  at  disparate  interfaces  Ii  and  l2- 

MathLib: 

L'^  =  link{MathLib,  Li) 

L  =link{L'^,  L2) 

The  first  step  satisfies  the  SC  import  of  MathLib  in  Li,  as  long  as  the  actual  interface  of  MathLib 
matches  the  import  interface  Ii.  The  result  L\  does  not  import  MathLib,  so  it  does  not  conflict  with 
the  import  of  MathLib  in  L2.  does  export  MathLib,  so  if  the  actual  interface  of  MathLib  matches 
I2,  then  the  second  link  succeeds.  Because  linking  is  left-associative,  L  =  /znA:(MathLib,  Li,  L2) 
accomplishes  the  same  thing. 

Any  implementation  of  MathLib  that  satisfies  both  Ii  and  I2  will  suffice.  Because  we  do  not  re¬ 
quire  unit  names  to  be  globally  unique,  this  implementation  of  MathLib  might  even  import  MathLib 
(again)  and  then  contain  some  glue  code  to  make  it  compatible  with  the  two  given  interfaces  Ii 
and  I2  (Figure  19).  We  expect  such  cases  to  be  uncommon,  the  preferred  methodology  being  to 
use  a  single  handoff  unit  for  all  clients. 

7  Related  Work 

There  are  several  closely  related  systems  that  influenced  the  design  of  SMLSC. 

The  notion  of  linkset  in  SMLSC  comes  from  Cardelli’s  investigation  of  separate  compilation 
and  type-safe  linking  in  the  simply-typed  A-calculus  [5] .  Our  formalization  of  linking  extends  these 
ideas  to  support  the  Standard  ML  module  system  including  signature  subtyping,  abstract  types, 
and  module  and  type  definitions  in  structures. 

Harper  and  Pierce  [11]  discuss  language  design  for  module  systems,  including  separate  compi¬ 
lation.  Particularly  relevant  to  the  current  work  is  their  discussion  of  sharing  of  abstract  types. 
They  describe  the  use  of  definite  references  to  avoid  the  coherence  problems  (and  excess  sharing 
specifications)  that  arise  from  aliasing. 

The  notion  of  a  handoff  unit  bears  some  resemblance  to  the  use  of  .h  files  in  C.  The  presence  of 
function  prototypes  in  a  .  h  file  provides  an  interface  for  application  code  that  includes  that  header 
file.  Code  that  references  a  prototyped  function  triggers  a  link-time  demand  for  that  function.  The 
degree  of  link-time  type-checking  varies  accross  C  implementations.  Usually,  type  correctness  is 
assured  by  programming  conventions. 

Clew  and  Morrisett  [8]  describe  separate  compilation  for  Typed  Assembly  Language  [19].  Their 
language,  MTAL,  permits  type  definitions,  abstract  types,  and  polymorphic  types  in  interfaces  and 
supports  recursive  linking. 

Jim  [14]  describes  a  A-calculus  P2  with  rank  2  intersection  types  that  has  principal  typings. 
The  principal  typings  property  means  that  from  a  term  M,  one  can  infer  both  F  and  r  such  that 
any  typing  derivation  T'  M  :  t'  is  an  instance  of  F  h  M  :  r.  In  a  system  with  principal  typings, 
program  fragments  can  be  separately  compiled  without  context  information,  meaning  that  SC 
imports  need  not  even  specify  interfaces.  Standard  ML,  however,  does  not  have  principal  typings. 
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It  remains  an  open  problem  to  design  a  type  system  that  supports  principal  types  for  features  such 
as  abstract  and  recursive  types,  and  type  definitions  in  modules. 

Objective  Caml.  The  separate  compilation  system  of  Objective  Caml  (O’Caml)  [15]  is  similar 
in  many  regards  to  SMLSC.  The  declaration  of  a  unit  U  is  an  O’Caml  module  stored  within  a  file 
called  U.ml.  The  interface  for  U  may  optionally  be  given  in  a  file  called  U.mli.  If  the  interface  is 
present,  other  units  depending  on  U  can  compile  even  if  the  implementation  is  not  available,  just 
as  in  SMLSC.  Because  the  filename  of  an  interface  indicates  the  unit  that  it  describes,  O’Caml 
interfaces  play  the  role  of  handoff  units  in  SMLSC.  Additionally,  O’Caml’s  use  of  the  filesystem  to 
provide  a  canonical  location  for  each  unit  and  interface  means  that  all  unit  references  are  definite. 

On  the  other  hand,  O’Caml’s  dependence  on  the  filesystem  means  that  the  language  is  not 
independent  from  its  environment.  For  instance,  unit  names  are  limited  to  valid  filenames  on  the 
host  system,  and  restructuring  a  project  on  disk  may  force  changes  to  the  code.  Another  significant 
difference  is  that  O’Caml  conflates  the  notions  of  units  and  modules.  This  earns  O’Caml  some 
conceptual  economy,  but  it  makes  it  impossible  to  separate  the  notions  of  top-level  declarations  and 
structure  components.  This  makes  it  necessary  to  support  signature  and  functor  definitions  within 
structures,  so  such  a  choice  would  not  be  compatible  with  our  design  principle  of  conservativity 
over  Standard  ML.  Finally,  unlike  SMLSC,  O’Caml  and  its  separate  compilation  system  are  defined 
informally  in  terms  of  their  implementation. 

Moscow  ML.  The  Moscow  ML  [20]  compiler  for  Standard  ML  supports  a  separate  compilation 
system  nearly  identical  to  Objective  Caml’s.  Moscow  ML  extends  the  Standard  ML  module  system 
to  allow  (among  other  things)  functor  and  signature  declarations  in  structures  and  specifications 
for  them  in  signatures.  Then,  like  O’Caml,  units  are  structures.  In  contrast,  SMLSC  does  not 
require  any  changes  to  the  Standard  ML  module  language. 

Other  Standard  ML  implementations  include  mechanisms  for  breaking  programs  up  into  com¬ 
pilation  units.  None  support  separate  compilation  in  the  sense  we  use  it  here;  they  use  the  term  to 
mean  cut-off  incremental  recompilation  (recall  Section  5) . 

SML/NJ  CM.  The  Compilation  Manager  for  Standard  ML  of  New  Jersey  (CM)  [3]  is  a  tool  for 
compiling  Standard  ML  programs  spread  across  many  source  files.  CM  permits  a  program  to  be 
divided  into  a  hierarchy  of  libraries  [4].  A  library  comprises  a  list  of  imported  libraries.  Standard 
ML  source  files,  and  a  list  of  symbols  exported  by  the  library.  Dependencies  between  libraries 
are  explicit  but  dependencies  among  the  source  files  in  a  library  are  inferred  [2,  9].  CM  provides 
control  over  the  identifiers  visible  to  a  source  file,  and  supports  conditional  compilation,  parallel 
compilation,  and  cut-off  incremental  recompilation.  CM  provides  no  way  for  the  programmer  to 
write  interfaces  nor  to  compile  against  unimplemented  units.  SMLSC  is  not  a  replacement  for  CM; 
we  believe  that  dependency  analysis  and  recompilation  tools  are  useful,  and  that  SMLSC  provides 
a  good  linguistic  target  for  such  tools. 

ML  Basis.  The  MLton  compiler  [18]  and  ML  Kit  [17]  implement  a  language  called  ML  Basis.  A 
“basis”  in  their  terminology  is  what  we  call  a  unit.  An  ML  Basis  program  is  a  series  of  declarations, 
including  a  binding  construct  for  bases  and  an  open  construct  for  basis  identifiers.  These  are 
analogous  to  SMLSC ’s  unit  declaration  and  IC  import  declaration.  Like  SMLSC,  the  order  of 
compilation  entities  is  explicit,  and  thus  each  program  has  unambiguous  meaning.  ML  Basis  is 
given  a  formal  semantics  [6]  in  terms  of  The  Definition  of  Standard  ML.  The  implementation 
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of  ML  Basis  in  the  ML  Kit  supports  cut-off  incremental  recompilation  based  on  Elsman’s  thesis 
work  [7].  Like  CM,  ML  Basis  does  not  provide  a  way  for  programmers  to  write  down  interfaces  or 
separately  compile  against  unimplemented  bases. 


8  Conclusion 

We  have  presented  an  extension  to  Standard  ML  for  separate  compilation  called  SMLSC.  Its  focus  is 
the  unit,  a  program  fragment  that  can  depend  on  other  program  fragments  through  either  separate 
or  incremental  compilation.  Via  the  programming  idiom  of  handoff  units — that  uses  both  separate 
and  incremental  compilation — we  limit  the  number  and  complexity  of  linguistic  mechanisms  while 
supporting  a  convenient  programming  style.  Our  formal  and  abstract  definition  of  the  language 
ensures  that  it  is  unambiguously  specified,  and  admits  a  variety  of  implementation  strategies. 
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A  TS  Linking  Rules 

The  typed  semantics  defines  a  closed  structure  mod  basis -sig  basis  serving  as  an  initial  basis  for  the 

TSIL.  The  elaborator  assumes  F  declares  basis: sig basis ^  which  includes  components  such  as  the 

built-in  Match  exception.  This  basis  structure  is  introduced  in  Rule  6  for  completion  and  Rule  12 

for  elaboration  of  source  units  in  projects. 

We  use  the  following  definitions  and  notation. 

•  Writing  BV(dec)  for  the  variable  declared  by  dec,  we  define  the  bound  variables  of  a  structure 
declaration  list,  BV(sdecs),  by 

BY {labi[>deci, . . . ,  labn>decn)  =  {BV(deci), . . . ,  BV(deCn)}. 


A  linkset 


L  =  sdecso  sbnds  :  sdecs]  S 
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binds  variables  BV(srfecso)  with  scope  sbnds  :  sdecs]  S  and  variables  BV(sdecs)  with  scope 
S.  We  write  BV(L)  for  BV(sdecso)  U  BV(sdecs). 

•  For  readability,  we  sometimes  elide  variables  in  structure  bindings  and  declarations.  It  should 
be  immediately  obvious  how  to  consistently  restore  these  with  fresh  variables. 

•  We  assume  that  unit  identifiers  are  disjoint  from  all  other  identifier  classes. 

•  We  assume  that  the  TS  overbar  injection  “  maps  identifiers  of  different  classes  to  different 
labels  and  that  there  are  infinitely  many  labels  not  in  its  range. 

We  assume  that  its  range  includes  neither  the  distinguished  label  basis,  nor  the  labels  chosen 
fresh  in  the  rules. 

•  Structure  declaration  lists  sdecs,  signature  abbreviations  S,  and  so  on  specify  lists  of  elements. 
We  adopt  the  following  notation  for  lists. 

—  We  denote  by  (•,  •)  the  operation  of  syntactic  concatenation;  for  example,  S,  S'. 

—  We  sometimes  use  pattern  matching  at  the  left  end  of  a  list,  writing  sigid=sig,  S  to 
match  the  first  binding  in  the  list. 

—  We  usually  omit  the  initial  •;  for  example, 

sigidi=sigi, . . . ,  sigidi=sigi. 

•  We  define  the  domain  of  a  signature  abbreviation,  dom(S'),  by 

dom(-)  =  0 

dov[i{S,  sigid=sig)  =  dom(5)  U 

doT[i{S,unitid=Su)  =  dom{S)  U  {unitid} . 

•  We  define  the  function  Sd+S'  by 

(.^50  =  5' 

{{sigid=sig ,  S)-\+S')  = 

f  sigid=sig,S"  if  sigid  0  dom(5") 

\  S"  otherwise 

where  S"  =  S^S' 

{{unitid=Su,  S)-\+S')  = 

f  unitid=Su,  S"  if  unitid  0  dom(5") 

\  S"  otherwise 

where  S"  =  S+^S'. 

It  concatenates  S  and  S' ,  making  the  result  well- formed  by  dropping  signature  abbreviations 
if  dom(5)  n  dom(S")  /  0. 

•  We  write  both  “=”  and  in  side-conditions.  Interpreting  the  rules  algorithmically,  the 
former  pattern-matches  inputs,  and  the  latter  specifies  an  output. 
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(1) 


decs  h  L  ok 


decs  h  sdecso  ok 
decs,  sdecsQ  h  sbnds  :  sdecs 
decs,  sdecsQ,  sdecs  h  S'  ok 

sdecso  =  lab i\>vari:[s decs i], . . . ,  labn> varn-[sdecsn] 
decs  h  sdecso  sbnds  :  sdecs-,  S  ok 
Rule  1:  Imports  are  restricted  to  structures.  The  elaborator  in  Appendix  B  needs  nothing  else. 

decs  h  5  ok 


h  decs  ok 
decs  h  •  ok 


(2) 


decs  h  sig  :  Sig  decs  h  5  ok  sigid  0  dom(S') 
decs  h  sigid=sig,  S  ok 

decs  h  S'  ok  decs  h  5  ok  unitid  0  dom(5) 

S'  =  {sigidi=sigi, ...,  sigid^=sigj  (4) 

decs  h  unitid=S',S  ok 


L  -w  exp  :  {} 


lab  0  dom(sdecs) 

^  sbnds  :  sdecs-,  S  [sbnds,  lab={}].lab  :  {} 

L basis  ■ —  ■  ^  basis — mod  basis  ■  bssis.si^  , 

b  Lbasis~H-L  L'  L'  exp  :  {} 

L  -w  exp  :  {} 


(5) 

(6) 


decs  h  L+\-L'  -w  L" 


L  =  sdecso  sbnds  :  sdecs-,  S 

decs  h  L+f(-  ^  sbnds'  :  sdecs'-.  S')  ^  (7) 

sdecso  sbnds+]- sbnds'  :  s decs -\+s decs';  5+fS" 

sdecs  =  sdecs" ,  lab\>var':sig' ,  sdecs'" 
decs,  sdecso,  sdecs  bsub  var':sig'  ^  sig  mod:sig" 
sbnd  :=  ll>?;ar=mod  sdec  :=  l\>var:sig" 

L  :=  sdecso  ^  sbnds+hsbnd  :  sdecs+f  sdec;  S  (g) 

decs  h  L+f  (sdecsi  ^  sbnds'  :  sdecs';  S')  L" 

decs  h  (sdecso  ^  sbnds  :  sdecs;  5)+f 

{lab\>var:sig,  sdecsi  sbnds'  :  sdecs';  S')  L" 
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lab  0  dom(sdecs) 

sdecsQ  =  sdecs” ,  lab\>var' :sig\  sdecs"' 
decs,  sdecsQ,  sdecs  h  sig  =  sig'  :  Sig 
L'  :=  {var' / var}{sdecsi  sbnds'  :  sdecs';  S') 

decs  h  {sdecsQ  sbnds  :  sdecs;  S)+hL'  -w  L" 

decs  h  {sdecso  sbnds  :  sdecs;  5)+f 

{lab\>var:sig,  sdecsi  sbnds'  :  sdecs';  S')  L" 

lab  0  dom(s(iecs)  U  dom(sdecso) 
decs,  sdecsQ,  sdecs  h  sig  =  sig'  :  Sig 
decs,  sdecso  h  sig'  :  Sig 
L  :=  sdecso,  lab>var:sig'  sbnds  :  sdecs;  S 
decs  h  L+f  (sdecsi  ^  sbnds'  :  sdecs';  S')  L" 

decs  h  (sdecso  ^  sbnds  :  sdecs;  5)+f 
{lab\>var:sig,  sdecsi  sbnds'  :  sdecs';  S')  L" 


(9) 


(10) 
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B  TS  Elaboration  Rules 


We  change  the  TS  elaborator  to  expand  signature  abbreviations.  First,  we  modify  every  TS  elabo¬ 
ration  judgement  and  rule  using  a  TS  elaboration  context  sdecs  to  use  sdecs]  S.  A  context  sdecs]  S 
binds  variables  BV (sdecs)  with  scope  S.  We  define  BV(r)  by  BV (sdecs).  Second,  we  extend  the 
syntax  for  TSEL  signature  expressions: 

sigexp  ::= 

sigid  signature  identifier 

Finally,  we  extend  the  TS  judgement  T  h  sigexp  sig  :  Sig,  adding  the  rule 

r  hctx  sigid  sig  :  Sig 
r  h  sigid  sig  :  Sig 


to  elaborate  signature  identifiers. 

We  use  the  following  definitions  and  notation. 

•  To  extend  an  elaboration  context  T  =  sdecs ]  S,  we  write 

r,  dec  for  sdecs,  l\>dec]  S, 
r,  sdecs'  for  sdecs,  sdecs';  S,  and 
r.  S'  for  sdecs;  S,  S'. 

We  also  define  a  function  R(sdecs)  that  renames  the  labels  in  sdecs  to  make  them  inaccessible 
to  identifier  resolution: 

R(labi\>deci, . . .  ,labn\>decn)  =  l\>deci, . . .  ,l\>decn. 

•  We  define  a  function  U (sdecs)  that  drops  the  labels  in  sdecs: 

U (labi\>deci, . . . ,  labn>decn)  =  deci, . . . ,  decn. 

•  When  an  elaboration  context  T  =  sdecs;  S  appears  in  a  judgement  requiring  an  IL  context 
decs,  we  implicitly  coerce  T  to  U (sdecs). 

•  We  define  the  substitution  function  a(var,  sdecs,  S)  by 

a(var,  ■,S)  =  S 

a(var,  (lab\>dec,  sdecs),  S)  = 

{var .lab /BY(dec)}a(var ,  sdecs,  S) 

where  {path/var}S  denotes  the  capture-free  substitution  of  path  for  free  occurrences  of  var 
in  S.  Rule  14  uses  a  to  elaborate  source  units. 


project  L 


.  ( . 


(11) 
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project  L  basis  0  BV  (L) 

L  =  sdecsQ  sbnds  :  sdecs]  S 
r  :=  basis : sig R{s decs o),  s decs]  S 
r  h  srcunit  L'  var  0  BV(L^) 
sdecs'i  sbnds'  :  sdecs']S'  :=  {var/basis}L' 
sdecsi  :=  bas\s\> v ar : sig sdecs'i 
h  L+]-{sdecsi  sbnds'  :  sdecs']  S')  L" 

project,  srcunit  L" 


(12) 


Rule  12:  The  side-condition  basis  0  BV(L)  can  always  be  achieved  by  renaming  bound  variables 
in  L. 


project  L 

BV(L)  n  BV(L')  =  0  h  L'  ok  h  L+fL'  -w  L" 
project,  L'  -w  L" 


r  h  srcunit  L 


r  h  topdec  L 
L  =  sdecso  sbnds  :  sdecs]  S 
var  0  BV(r)  U  BV(L) 
sbnds'  :=  unitid\>var=[sbnds\ 
sdecs'  :=  unitid\>var:[sdecs] 
S'  :=  unitid=a{var,  sdecs,  S) 

r  h  unit  unitid  =  topdec 
sdecso  sbnds'  :  sdecs']  S' 


r  h  topdec  L 


r  h  impexp  L 
r  h  import  impexp  L 

r  h  strdec  sbnds  :  sdecs 
r  h  strdec  ^  sbnds  :  sdecs]  ■ 

r  h  sigbind  5 

r  h  signature  sigbind  ^  S 


(13) 


(14) 


(15) 

(16) 

(17) 
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r  h  topdec  sdecso  sbnds  :  sdecs;  S 
var  0  BV(r)  U  BV(sdecso) 

T ,  R(sdecso),l*\>var:[sdecs],  S  topdec'  L' 
L  :=  sdecso  l\>var=[sbnds]  :  l\>var:[sdecs]]  ■ 
r  h  L-\+L'  -w  L" 

r  h  local  topdec  in  topdec'  end  L" 

r  h  topdec  L 
L  =  sdecsQ  sbnds  :  sdecs]  S 
r,  R(sdecso),  sdecs,  S  h  topdec'  ^  L' 
r  h  LVrL'  -w  L" 

r  h  topdec  topdec  L" 


(18) 


(19) 


r  h  impexp  L 


r  hctx  unitid  var  :  sig 
r  hctx  unitid  S  var'  0  BV(r) 

L  :=  unitid\>var':sig  l*=var'  :  l*:sig]  S 

r  h  unitid  L 

Rule  20:  Rules  12,  18,  and  19  use  R{-)  to  hide  imported  units  from  IR  imports.  The  signature  sig 
should  be  fully  selfified. 


r  h  spec  sdecs  var'  0  BV(r) 
r,  var'  :  [sdecs]  h  var'  :  sig 
L  :=  unitid\> var' -.[sdecs]  l*=var'  :  l*:sig-,  • 

r  h  unitid  :  intf  spec  end  -w  L 

Rule  21:  The  signature  sig  should  be  fully  selfified. 

T  h  impexp  L  T  h  impexp'  L' 
BV(L)  n  BV(L')  =  0  T  h  L+fL'  -w  L" 

T  h  impexp  impexp'  L" 


T  h  sigbind  S 

T  h  sigexp  sig  :  Sig  S  :=  sigid=sig 
(T  h  sigbind  S'  sigid  0  dom(5')) 

T  h  sigid  =  sigexp  (and  sigbind)  -  w5(,5') 

Rule  23:  Either  all  optional  elements  or  none  must  be  present. 


(21) 


(22) 


(23) 
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r  hctx  sigid  sig  :  Sig 


sdecs]  S,  sigid=sig  h  sigid  sig  :  Sig 


(24) 


sigid'  /  sigid 

sdecs;  S  h  sigid  sig  :  Sig 
sdecs;  S,  sigid' =sig'  h  sigid  sig  :  Sig 

sdecs;  S  h  sigid  sig  :  Sig 
sdecs;  S,  unitid=S'  h  sigid  sig  :  Sig 


r  hctx  unitid  S 


sdecs;  S,  unitid=S'  h  unitid  S' 

unitid'  /  unitid 
sdecs;  S  h  unitid  S" 

sdecs;  S,  unitid' =S'  h  unitid  S" 

sdecs;  S  h  unitid  S' 
sdecs;  S,  sigid=sig  h  unitid  S' 


r  ok 

h  U (sdecs)  ok 
sdecs;  ■  ok 

sdecs;  S  ok  sdecs  h  sig  :  Sig 
sdecs;  S,  sigid=sig  ok 

sdecs;  S  ok  sdecs  h  S'  ok 
S'  =  {sigidi=sigi, sigid^=sigj 

sdecs;  S,  unitid=S'  ok 


(25) 

(26) 

(27) 

(28) 

(29) 

(30) 

(31) 

(32) 
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B  or  T,  F,G,E  G 
T  G 

F  G 
G  G 

E  or  (SE,  TE,  VE)  G 
or  (r)(E,(r')E')  G 
S  or  {T)E  G 

SE  G 
TE  G 
VE  G 
t  G 
funid  G 
sigid  G 
strid  G 
tycon  G 
vid  G 


Basis  =  TyNameSet  x  FunEnv  x  SigEnv  x  Env 
TyNameSet  =  Ein(TyName) 

FunEnv  =  Funid  FunSig 

SigEnv  =  Sigid  Sig 

Env  =  StrEnv  x  TyEnv  x  ValEnv 

EunSig  =  TyNameSet  x  (Env  x  Sig) 

Sig  =  TyNameSet  x  Env 

StrEnv  =  Strid  Env 
TyEnv  =  TyCon  TyStr 

ValEnv  =  VId  TypeScheme  x  IdStatus 
TyName  (type  names) 

Eunid  (functor  identifiers) 

Sigid  (signature  identifiers) 

Strid  (structure  identifiers) 

TyCon  (type  constructors) 

Vid  (value  identifiers) 


Eigure  20:  Static  TDIL  for  Standard  ML  (summary).  Ein(yl)  denotes  the  set  of  finite  subsets  of  A 


C  Unit  Static  Semantic  Rules 


We  use  the  following  definitions  and  notation. 

•  TD’s  static  semantics.  We  recall  the  static  TDIL  for  Standard  ML  in  Eigure  20. 

We  write  E0  for  the  empty  environment  ({},{},{})• 

We  write  tynames  A  for  the  set  of  free  type  names  in  the  semantic  object  A. 

We  write  tyvars  A  for  the  set  of  free  type  variables  in  the  semantic  object  A  [16,  Section  4.2]. 

Eor  any  semantic  object  A,  we  write  A  wf  for  “every  type  structure  occuring  in  A  is  well- 
formed”  [16,  Section  4.9]. 

We  assume  familiarity  with  realizations  g?  and  their  support  Supp  cp  [16,  Section  5.2]. 

We  assume  familiarity  with  signature  instantiation  and  enrichment  [16,  Sections  5.3  and  5.5]. 


•  Sets.  We  write  A  \  B  for  {a  G  A  ;  a  0  B}. 

We  write  if^A  for  the  cardinality  of  the  finite  set  A. 


•  Finite  maps.  The  domain  of  a  finite  map  f  :  A^  B  is  the  set  dom(/)  of  those  a  G  A  for 
which  /  is  defined. 

Einite  maps  may  be  specified  explicitly;  for  example  {oi  6i, . . . ,  a„  bn}  (n  >  0). 

We  extend  this  notation  to  a  form  of  set  comprehension,  writing  {a  6  ;  4>}  for  the  map 

taking  every  a  satisfying  cj)  to  6(a). 

•  Extension.  To  extend  finite  maps  f,g:A^B,vfe  define  f  +  g  :  A^  B  hy 


{f  +  9){a) 


g{a)  if  a  G  dom(5() 
/(a)  otherwise. 
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We  define  T  +  T'  by  T  U  T' . 

We  extend  semantic  objects  componentwise;  for  example,  E  +  E'  and  B  +  B'  have  the  evident 
definitions. 

We  extend  components  in  a  semantic  object  when  it  is  unambiguous  to  do  so;  for  example, 

T  +  U  =  B,U'+U 
T  +  B'  =  B+B',  U' 

r  +  T  =  B+T,  U'  =  {T'+T,  F,  G,  E) ,  U' 
where  T  =  B,U'  and  B  =  T' ,  F,  G,  E. 

We  lift  application  to  finite  maps  in  semantic  objects  when  it  is  unambiguous  to  do  so;  for 
example, 

T{unitid)  =  U{unitid) 

B{strid)  =  E{strid)  =  SE{strid) 

where  T  =  B,U-  B  =  T,F,  G,  E-  and  E  =  {SE,  TE,  VE). 

•  Projection.  We  write  (•  of  •)  for  projection  from  semantic  objects;  for  example, 

T  of  r  =  T  of  s  =  r' 
where  T  =  B,U  and  B  =  T' ,  F,  G,  E. 


r  h  udecs  :  uspecs 


r  ok 

r  h  •  :  • 

srcunit  =  (unit  unitid  =  unit  topdec  end)  T  =  iT){F,G,E) 
r  h  topdec  :  T,  F,  G,  E 

r  +  T  +  {unitid  ^  T,  F,  G,  E}  h  udecs  :  uspecs 
r  h  {srcunit ,  udecs)  :  {unitid :T ,  uspecs) 


(33) 


(34) 


r  h  topdec  :  B 


r  h  impexp  :  F,G,E 
r  h  import  impexp  :  0,  F,  G,  E 

B  h  strdec  =+  E  T  :=  tynames  E\{T  oi  B)  tyvars  E  =  $ 
B,U  \-  strdec  :  T,  {},  {},  ill 


(35) 

(36) 


B  h  sigdec  =+  G 
B,U  \-  sigdec  :  0,  {},  G,  Etj^ 


(37) 


Rule  37:  TD’s  Rules  65  and  67  ensure  that  tynames  G  =  0.  TD’s  rules  for  B  h  spec  =+  E  ensure 
that  tyvars  G  =  0. 
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B  h  fundee  F  T  :=  tynames  F\{T  of  B)  tyvars  F  =  0 
B,U  \-  fundee  :  T,  F,  {},  F0 


(38) 


r  h  topdee  :B  T  +  Bh  topdec  :  T',  F\  G' ,  E'  T  :=  (T  of  B)  U  T' 
r  h  local  topdee  in  topdee'  end  :  T,  F' ,G' ,  E' 

r  h  topdee  :  B  F  +  F  h  topdee'  :  B' 
r  h  topdee  topdec  :  B  +  B' 


r  h  impexp  :  E,G,E 

T{unitid)  =  T,  F,  G,  E 
r  h  unitid  :  E,G,E 

5  of  r  h  intexp  ^  T  T  =  (T)(F,  {},  E) 
T{unitid)  =  T',  F',  G',  E'  E',  G',  E'  :  T  using  tp 

r  h  {unitid  :  intexp)  :  p{E),{},ip{E) 

r  h  impexp  :  E,G,  E  F  h  impexp'  :  E' ,G' ,E' 

F  h  impexp  impexp'  :  F+E' ,  G+G' ,  E+E' 


(39) 

(40) 


(41) 


(42) 


(43) 


B  h  intexp  T 


B  h  top  spec  F,E  T  :=  (tynames  E,  E)\{T  of  F) 

B  h  intf  topspee  end  {T){F,  {},F) 


(44) 


B  h  topspee  ^  E,E 

B  h  spec  E 

-IFZ - TTIT 

B  h  spec  {},  F 
B  h  funspee  F 

- ^ - - -  (46) 

B  h  functor  funspee  ^  F,  {} 

B  h  topspee  F,E  F  +  F  h  topspee'  F',E' 

dom(F)  n  dom(F')  =  0  dom(F)  n  dom(F')  =  0  (47) 

B  h  topspee  topspee'  F+F',  E+E' 


39 


B  h  funspec  F 


B  h  sigexp  {T)E  B  +  T  +  {strid  h  sigexp'  ^  {T')E' 

F  :=  {funid  ^  {T){E,  {T')E')} 

{B  h  funspec  F'  funid  0  dom(F')) 

B  h  funid{strid  :  sigexp)  :  sigexp'  (and  funspec)  ^  F{+E') 


(48) 


r  ok 


tynames  E,G,E,U  C  T  E,  G,  E,  U  wf 
(T,E,G,E),[/  ok 


(49) 


r  h  T  :  Sig 


T  =  T  of  r 

TnT'  =  0  tynames  E,G,E  C  TUT' 
tyvars  F,G,E  =  0  E,G,E  wf 

rh(rO(F,G,F):Sig 


(50) 


r  h  uspecs  ok 


r  ok 
r  h  •  ok 


(51) 


r  h  T  :  Sig  T  =  {T'){F,G,E)  T  +  T'h  uspecs  ok 
r  h  unitid:T,  uspecs  ok 


(52) 


E  :  S  using  tp 


T,  >  E  using  (f  E  y  E 
E  :  S  using  ip 


(53) 


S  =  S'  using  p 


T  =  p{T')  E  =  p{E')  Supp  pdT'  #r  =  #T' 
{T)E  =  (r')F'  using  p 


(54) 
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{T)E  =  {T')E'  using  y?  F  =  (^(F')  G  =  ip{G') 
{T){F,G,E)  =  {T')(E\G\E')  using 


(55) 


T  =  T'  using  (f 


F,G,E  :  T  using  ip 


E  :  {T)E'  using  (p 
dom(F)  D  dom(F') 


Mfunid  G  dom(F').  p>2  < 


E[  :  (Fi)Fi  using  ipi 
ipi{E2)  :  (r^)F^  using  ip2 
where  (r()(F(,  (r^)F0  =  ip{F' {funid)) 
and  (ri)(Fi,  {T2)E2)  =  F{funid) 


dom(G)  D  dom(G') 

Msigid  G  diom.{G').3(p' .G{sigid)  =  cp{G\sigid))  using  cp' 


F,G,E:{T){F',G',E')  using 


(56) 


41 


{F,  G,  E)  oi  B  G 
F  G 
G  G 

(SE,  TE,  VE)  or  E  G 
{strid  :  I,strexp,B)  G 
(5/,  TI,  VI)  or  /  G 

SE  G 
TE  G 
VE  G 
5/  G 
TI  G 

V7  G 
(G,  I)  or  IB  G 


Basis  =  FunEnv  x  SigEnv  x  Env 
FunEnv  =  Funid  ^  FunctorClosure 

SigEnv  =  Sigid  ^  Int 
Env  =  StrEnv  x  TyEnv  x  ValEnv 
EunctorClosure  =  (Strid  x  Int)  x  StrExp  x  Basis 
Int  =  Strint  x  Tyint  x  Valint 

StrEnv  =  Strid  Env 
TyEnv  =  TyCon  ValEnv 
ValEnv  =  VId  ^  Val  x  IdStatus 
Strint  =  Strid  Int 
Tyint  =  TyCon  ^  Valint 

Valint  =  VId  IdStatus 
IntBasis  =  SigEnv  x  Int 


Eigure  21:  Dynamic  TDIL  for  Standard  ML  (summary) 


D  Unit  Dynamic  Semantic  Rules 

We  use  the  following  definitions  and  notation. 

•  We  recall  the  dynamic  TDIL  for  Standard  ML  in  Eigure  21. 

•  We  extend  the  TDIL  category  EunctorClosure  as  follows,  permitting  an  optional  ascribed 
interface  on  the  functor  body. 

FunctorClosure  =  FClos  U  FClos^ 

{strid  :  I,  strexp,  B)  G  FClos  =  (Strid  x  Int)  x  StrExp  x  Basis 
{strid  :  I,  strexp  :  B)  G  EClos'  =  (Strid  x  Int)  x  (StrExp  x  Int)  x  Basis. 

Here,  Au  B  denotes  the  disjoint  union  of  A  and  B. 

•  We  define  the  function  J,  :  Basis  x  UnitInt  ^  Basis  that  cuts  down  a  basis  B  to  match  a  unit 
interface  UI: 

I  :  Basis  x  UnitInt  ^  Basis 
{F,G,E)i{FI,I)  =  {FiFI,{},EiI) 

I  :  FunEnv  x  EunIntEnv  ^  EunEnv 

E  J,  EJ  =  {funid  F{funid)  [  FI  {funid)  ;  funid  G  dom(E)  n  dom(E/)} 

J,  :  FunctorClosure  x  (Int  x  Int)  ^  FunctorClosure 
{strid:Io,  strexp,  B)  {  {!,!')  =  {strid:I,  strexp:!' ,  B) 

{strid:Io,  strexp-.lQ,  B)  {  {!,!')  =  {strid:I ,  strexp:!' ,  B) . 

•  TD’s  evaluation  judgement  s,B  h  strexp  ^  Elp,s'  handles  functor  application  for  functor 
closures  of  the  form  FClos.  We  extend  that  judgement  by  adding  the  following  rules  to  handle 
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functor  closures  of  the  form  FClosh 


B{funid)  =  {strid:I,  strexp'-.r ,  B') 
s,B  \-  strexp  ^  E,  s'  s',  B'+{strid  E  I  1}  \-  strexp'  E' ,  s" 

s,B  \-  funid{strexp)  E'  I  I' ,  s" 

B{funid)  =  {strid: I,  strexp':!' ,  B') 
s,  B  \-  strexp  ^  p,  s' 

s,  B  \-  funid{strexp)  p,  s' 

B{funid)  =  (strid:!,  strexp':!' ,B') 
s,  B  \-  strexp  ^  E,  s'  s' ,  B'+{strid  E  I  !}  \-  strexp'  p,  s" 

s,B  \-  funid(strexp)  p,  s" 


s,  r  h  udees  ^  F'/p,  s' 


s,  F  h  •  ^  F,  s 


(57) 


sreunit  =  (unit  unitid  =  unit  topdee  end) 
s,F  h  topdee  ^  B,s'  s',T+{unitid  ^  B}  h  udees  ^  F',s" 

s,F  h  sreunit,  udees  ^  F',  s" 

sreunit  =  (unit  unitid  =  unit  topdee  end) 
s,  F  h  topdee  ^  p,  s' 

s,  F  h  sreunit,  udees  ^  p,  s' 

sreunit  =  (unit  unitid  =  unit  topdee  end) 
s,F  h  topdee  ^  B,s'  s' ,T+{unitid  ^  B}  h  udees  p,s" 

s,  F  h  sreunit,  udees  p,  s" 


s,F  h  topdee  ^  B/p,s' 

F  h  impexp  ^  B 
s,  F  h  import  impexp  B,s 

s,B  \-  strdee  ^  E,  s' 
s,  (B,  U)  h  strdee  ^  ({},  {},  -S),  s' 

s,B  \-  strdee  ^  p,  s' 
s,  (B,  U)  h  strdee  ^  p,  s' 

Inter  B  h  sigdee  ^  G 
s,{B,U)  h  sigdee  ({},G,F;0),s 


(58) 

(59) 

(60) 

(61) 

(62) 

(63) 

(64) 
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B  h  fundee  ^  F 

s,  {B,  U)  h  fundee  {F,  {},  £^0),  s 


(65) 


s,r  h  topdee  ^  B,s'  s',T+B  h  topdee'  ^  B',s" 
s,  r  h  local  topdee  in  topdee'  end  ^  B' ,  s" 

s,  r  h  topdee  ^  p,  s' 

s,r  h  local  topdee  in  topdee'  end  ^  p,  s' 

s,  r  h  topdee  B,  s'  s' ,T+B  h  topdee  ^  p,  s" 
s,r  h  local  topdee  in  topdee'  end  ^  p,  s" 

s,r  h  topdee  ^  B,s'  s',T+B  h  topdee'  ^  B',s" 
s,  r  h  topdee  topdee'  ^  B+B',  s" 

s,  r  h  topdee  ^  p,  s' 
s,  r  h  topdee  topdee'  ^  p,  s' 

s,  r  h  topdee  ^  B,  s'  s' ,T+B  h  topdee'  ^  p,  s" 
s,r  h  topdee  topdee  ^  p,  s" 


r  h  impexp  ^  B 

T{unitid)  =  B' 
r  h  unitid  ^  B' 

Inter  B  h  topspee  UI  U {unitid)  =  B' 
B,U  \-  unitid  :  intf  topspee  end  B'  [UI 

r  h  impexp  B  F  h  impexp'  B' 
r  h  impexp  impexp'  B  +  B' 


IB  h  topspee  ^  UI 

IB  h  spee  ^  I 
IB  h  spee  ^  {},  I 

IB  h  funspee  ^  FI 

IB  h  functor  funspee  ^  FI,  ({},  {},  {}) 

IB  h  topspee  ^  FI,  I  IB  +  I  \-  topspee'  FI' ,  I' 
IB  h  topspee  topspee'  ^  FI+FI'  ,1+1' 


(66) 

(67) 

(68) 

(69) 

(70) 

(71) 

(72) 

(73) 

(74) 

(75) 

(76) 

(77) 
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IB  h  funspec  FI 


IB  h  sigexp  /  IB  +  {strid  1}  h  sigexp'  I' 

FI  :=  {funid  (/,  /')}  {IB  h  funspec  FI')  (78) 

IB  h  funid{strid  :  sigexp)  :  sigexp'  (and  funspec)  ^  FI{+FI') 


E  TD  Linking  Rules 

TD  defines  the  initial  static  basis  Bq  =  Tq,  Fq,  Gq,  Eq  [16,  Appendix  C] .  We  define  the  basis  interface 

'^basis  =  {To){Fo,Go,  Eq) 

and  assume  the  unit  identifier  basis  may  not  appear  in  source  units, 
r  h  L  ok 


r  h  uspecs  ok 
r  +  uspecs  h  exps  ok 
uspecs  =  unitidi'.Ti, . . . ,  unitidn'-Fn 
unitidi, . . . ,  unitidn  are  distinct 

r  h  uspecs  exps  ok 


(79) 


r  h  exps  ok 

r  h  udecs  :  uspecs 
r  h  udecs  :  uspecs  ok 

rh(r)(F,G,j^):Sig 
ThT,F,G,E  ok 


(80) 

(81) 


L  ^  udecs 


udecs  :  uspecs  ^  udecs 


(82) 


T  =  T basis  using  ip 
basis: T  ^  udecs  :  uspecs  ^  udecs 


(83) 


r  h  LFrL'  =P  L" 


r  h  exps-\+exps'  =P'  exps" 

r  h  {uspecsQ  exps)+l{-  exps')  ^  uspecs^  exps" 


(84) 
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unitid  G  dom(uspecs) 

(r  +  uspecsQ  +  uspecs) (unitid)  =  T' ,  F' ,  G' ,  E' 

F',G',E'  :  T  using  tp 
L'  :=  ip(uspecsi  udecs'  :  uspecs') 
r  h  (uspecsQ  udecs  :  uspecs)Fi-L'  ^  L" 

r  h  (uspecsQ  udecs  :  uspecs) Ft- (unitid :T ,  uspecsi  udecs'  :  uspecs')  ^  L" 

unitid  0  dom.(exps) 
uspecsQ  =  uspecs" ,  unitid:T' ,  uspecs'" 

T'  =  T  using  p 
L'  :=  ip(uspecsi  exps') 
r  h  (uspecs Q  exps)F-L'  L" 

r  h  (uspecsQ  exp s)F- (unitid :T ,  uspecs^  exps')  ^  L" 

unitid  0  doTa.(exps)  U  dom(us]9ecso) 

T'  =  T  using  p 
r  +  BT(uspecso)  1“  TT'  :  Sig 
L  :=  uspecsQ,  unitid:T'  exps 
L'  :=  ip(uspecsi  exps') 

r  h  lftL'  =p  l" 

r  h  (uspecsQ  exp s)F- (unitid :T ,  uspecs^  exps')  ^  L" 


(85) 


(86) 


(87) 


r  h  exps-\+exps'  ^  exps" 


r  h  (udecs  :  uspecs) F- (udecs'  :  uspecs')  ^  udecs,  udecs'  :  uspecs,  uspecs' 


(88) 


r  h  bftB'  ^  b  +  b' 


(89) 


F  TD  Elaboration  Rules 

We  define  the  bound  type  names  in  a  linkset,  BT(L),  by 

BT(uspecsQ  udecs  :  uspecs)  =  BT(uspecsg)  U  BT(uspecs) 

BT(uspecsQ  B)  =  BT(uspecsg). 


project  ^  L 


(90) 
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project  T bas^s  =  {T){F,G,  E)  TnBT(L)  =  0 

L  =  uspecsQ  udecs  :  uspecs 
r  :=  {{T,F,G,E),{})  +BT(uspecsQ)  +  uspecs 
r  h  srcunit  ^  uspecsi  udecs'  :  uspecs' 
h  {bas\5:T basis  1  uspecsi  udecs'  :  uspecs')  ^  L" 

project,  srcunit  L" 
project  ^  L 

BT(L)  n  BT(L')  =  0  h  L'  ok  h  L+fL'  ^  L" 
project,  L'  ^  L" 


(91) 


(92) 


r  h  srcunit  ^  L 


srcunit  =  (unit  unitid  =  unit  topdec  end) 
r  h  topdec  ^  uspecsQ  T,  F,  G,  E 

r  h  srcunit  ^  uspecsQ  srcunit  :  unitid:{T){F,  G,  E) 


(93) 


r  h  topdec  ^  L 

r  h  impexp  ^  L 
r  h  import  impexp  L 

topdec  =  strdec  T  h  topdec  :  B 
r  h  topdec  ^  ^  B 

topdec  =  sigdec  T  h  topdec  :  B 
r  h  topdec  ^  ^  B 

topdec  =  fundee  T  h  fundee  :  B 
r  h  topdec  ^  ^  B 

r  h  topdec  ^  uspecs Q  B 
r  +  BT(uspecso)  +  B  topdec'  ^  L 
B'  :=  T  of  B,  {},  {},  ^0  r  h  {uspecso  B')+^L  =>  L' 

r  h  local  topdec  in  topdec'  end  L' 

r  h  topdec  L  L  =  uspecs^  B 
r  +  BT(uspecsg)  +  B  h  topdec'  ^  L' 

r  h  lvtL'  L" 

r  h  topdec  topdec'  ^  L" 


(94) 

(95) 

(96) 

(97) 

(98) 

(99) 
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r  h  impexp  ^  L 


T{unitid)  =  T,F,G,E  L  :=  unitid:{T){F,G,E)  ^  ^,F,G,E 

r  h  unitid  ^  L 


r  h  intexp  =>  {T){F,  G,  E)  L  :=  unitid:{T){F,  G,  E)  0,  F,  G,  E 
r  h  unitid  :  intexp  L 

r  h  impexp  L  F  h  impexp'  V 
BT(L)  n  BT(L')  =  0  r  h  LVrL'  L" 

r  h  impexp  impexp'  L" 


(101) 


(102) 
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interpreter 


Figure  22:  The  dependencies  among  the  libraries  and  the  Mini-ML  interpreter.  The  interpreter 
uses  all  three  libraries  and  the  input  and  parsing  libraries  use  the  stream  library. 

G  Larger  Example:  Mini-ML 

In  this  appendix  we  present  a  more  realistic  example  based  on  code  used  in  Frank  Pfenning’s  1995 
Logic  Programming  course  at  Carnegie  Mellon.  The  example  comprises  a  stream  library,  an  input 
library,  a  parsing  library,  and  an  interpreter  for  Mini-ML  that  uses  the  libraries.®  In  Figure  22, 
we  illustrate  the  high-level  dependencies  in  the  code.  We  first  discuss  the  library  handoff  units 
and  the  interpreter.  Then  we  give  library  implementations  and  show  how  to  link  these  to  make  an 
executable  interpreter. 

Stream  Library 

The  stream  library  declares  a  signature  BASIC_STREAM  describing  the  visible  “core”  of  functional 
streams,  a  signature  STREAM  that  extends  BASIC_STREAM,  a  functor  Stream  that  lifts  a  BASIC_STREAM 
structure  to  STREAM,  and  plain  and  memoizing  STREAM  structures.  The  handoff  unit  StreamLib 
declares  all  of  these  components  by  importing  units  STREAMSIG  and  Streamlmpl: 

unit  STREAMSIG  = 
unit 

signature  BASIC_STREAM  = 
sig 

type  ’a  stream 

val  empty  :  ’a  stream 

val  create  :  (unit  ->  (’a  *  ’a  stream)  option)  ->  ’a  stream 
val  expose  :  ’a  stream  ->  (’a  *  ’a  stream)  option 
(*  • • •  *) 
end 

®The  libraries  were  written  by  Chris  Okasaki,  Frank  Pfenning,  and  Bob  Harper.  Pfenning  wrote  the  interpreter 
to  demonstrate  the  libraries  to  teams  of  students  using  them  to  implement  compilers  for  the  course. 
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signature  STREAM  = 
sig 

include  BASIC_STREAM 
exception  Empty 

val  delay  :  (unit  ->  'a  stream)  ->  ’a  stream 
val  Icons  :  'a  *  (unit  ->  ’a  stream)  ->  ’a  stream 
val  cons  :  ’a  *  ’a  stream  ->  'a  stream 
val  map  :  (’a  ->  ’b)  ->  ’a  stream  ->  'b  stream 
(*  • • •  *) 
end 

end 

unit  StreamLib  = 
unit 

import  STREAMSIG 

import  Streamlmpl  : 
intf 

functor  Stream (structure  BasicStream  :  BASIC_STREAM) 

:  STREAM  where  type  ’a  stream  =  ’a  BasicStream. stream 

structure  PlainStream  :  STREAM 

structure  MemoStream  :  STREAM 

(*  default  stream  package  memoizes  *) 
structure  Stream  :  STREAM 

where  type  'a  stream  =  ’a  MemoStream. stream 

end 

end 

The  other  libraries  and  the  interpreter  import  StreamLib,  heavily  using  its  structure  Stream  : 
STREAM  but  never  directly  referring  to  the  auxiliary  units  STREAMSIG  and  Streamlmpl.  We  shall 
give  a  stream  implementation  Streamlmpl  that,  like  StreamLib,  performs  IC  against  STREAMSIG 
to  obtain  the  stream  signatures. 

Input  Library 

The  input  library  declares  a  signature  INPUT  and  structure  Input  :  INPUT.  Structure  Input  creates 
character  streams.  For  example,  the  function 

Input . promptkeybd  :  string  ->  char  Stream. stream 

creates  a  stream  that,  when  forced,  prompts  the  user  for  input  a  line  at  a  time.  The  handoff  unit 
InputLib  imports  units  INPUTSIG  and  Input Impl: 
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unit  INPUTSIG  = 
unit 
local 

import  StreamLib 
in 

signature  INPUT  = 
sig 

val  readfile  :  string  ->  char  Stream. stream 
val  readkeybd  :  unit  ->  char  Stream. stream 

val  promptkeybd  :  string  ->  char  Stream. stream 
end 

end 

end 

unit  InputLib  = 
unit 

import  INPUTSIG 

import  Input Impl  : 
intf 

structure  Input  :  INPUT 
end 

end 

This  description  of  the  input  library  performs  IC  against  the  stream  library  description  (unit 
StreamLib)  but  SC  against  its  implementation  (unit  Streamlmpl). 

Parsing  Library 

The  parsing  library  declares  a  signature  POS  and  structure  Pos  :  POS  for  positions  within  a  file,  a 
signature  BASIC-PARSING  describing  a  “core”  set  of  parsing  combinators,  a  signature  PARSING  that 
extends  BASIC-PARSING  with  additional  combinators  and  utilities,  a  functor  Parsing  that  lifts  a 
BASIC-PARSING  structure  to  PARSING,  and,  finally,  a  structure  Parsing  :  PARSING.  The  parsing 
library  uses  the  stream  library.  For  example, 

Pos .markstream  :  char  Stream. stream  ->  (char  *  Pos. pos)  Stream. stream 

marks  a  character  stream  with  position  information  for  reporting  errors  and 

Parsing. transform  :  (’a,’t)  Parsing. parser 
->  (’t  *  Pos. pos)  Stream. stream 
->  ’a  Stream. stream 

creates  a  stream  of  parsed  values  from  a  parser  and  a  marked  stream  of  tokens.  Following  the  now 
familiar  pattern,  the  handoff  unit  ParsingLib  imports  PARSINGSIG  and  Parsinglmpl: 
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unit  PARSINGSIG  = 
unit 


local 

import  StreamLib 
in 

signature  POS  = 
sig 

type  pos 

val  markstream  :  char  Stream. stream  ->  (char  *  pos)  Stream. stream 
(*  • • •  *) 

val  makestring  :  pos  ->  string 
end 

signature  BASIC_PARSING  = 
sig 

(*  Parser  with  token  type  't,  result  type  'a  *) 
type  (’a,’t)  parser 
type  pos  (*  =  Pos. pos  *) 

val  succeed  :  ’a  ->  (’a,’t)  parser 
val  fail  :  ('a,’t)  parser 

val  done  :  'a  ->  ('a,’t)  parser 
val  any  :  ('t,’t)  parser 

val  —  :  (’a,’t)  parser  *  ('a  ->  ('b,’t)  parser)  ->  (’b,’t)  parser 
val  ##  :  (’a,’t)  parser  *  (pos  ->  ('a,’t)  parser)  ->  (’a,'t)  parser 

val  !!  :  (’a,’t)  parser  ->  (’a  *  pos,’t)  parser 

val  $  :  (unit  ->  ('a,’t)  parser)  ->  (’a,’t)  parser 

val  lookahead  :  (’a,’t)  parser  ->  ('a  ->  ('b,’t)  parser) 

->  (’b,’t)  parser 

val  parse  :  (’a,’t)  parser  ->  (’t  *  pos)  Stream. stream  ->  'a  option 
val  transform  :  (’a,’t)  parser  ->  ('t  *  pos)  Stream. stream 
->  ’a  Stream. stream 

end 

signature  PARSING  = 
sig 

include  BASIC_PARSING 

val  &&  :  (’a,’t)  parser  *  ('b,’t)  parser  ->  (’a  *  'b,’t)  parser 
val  II  :  (’a,’t)  parser  *  ('a,’t)  parser  ->  (’a,’t)  parser 
(*  •  •  •  many  more  •  •  •  *) 
end 

end 

end 
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unit  ParsingLib  = 
unit 

import  PARSINGSIG 

import  Parsinglmpl  : 
intf 

structure  Pos  :  POS 

functor  Parsing (structure  BasicParsing  :  BASIC_PARSING 
where  type  pos  =  Pos. pos) 

:  PARSING 

where  type  ’a  parser  =  ’a  BasicParsing. parser 
where  type  pos  =  Pos. pos 

structure  Parsing  :  PARSING 
where  type  pos  =  Pos. pos 

end 

end 

Interperter 

The  Mini- ML  interpreter  is  a  single  unit  Interp  that  declares  signatures  and  structures  for  identi¬ 
fiers,  lexical  tokens,  an  abstract  syntax,  lexical  analysis,  parsing,  evaluation,  and  a  read-eval-print 
loop.  It  performs  IC  against  the  handoff  units  StreamLib,  InputLib,  and  ParsingLib  and  units 
BooILib  and  IntLib  (not  shown  but  assumed  to  declare  the  Standard  Basis  Library  structures 
Bool  and  Int). 

unit  Interp  = 
unit 

import  StreamLib  InputLib  ParsingLib  IntLib  BooILib 

signature  ID  = 
sig 

eqtype  id 

val  id  :  string  ->  id 
val  eq  :  id  *  id  ->  bool 
val  makestring  :  id  ->  string 
end 

structure  Id  :>  ID  =  struct  (*  •••  *)  end 
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signature  TOKEN  = 
sig 

datatype  token  = 

Id  of  Id. id 
I  Num  of  int 
I  LParen  |  RParen 
I  Plus  I  Times  |  Neg  |  Eql 
I  Fn  I  Rec  |  Is 

I  LAngle  |  RAngle  |  Comma  |  Fst  |  Snd 
I  True  I  False  |  If  |  Then  |  Else 
I  Let  I  Be  I  In 
I  Semi 

val  makestring  :  token  ->  string 
end 

structure  Token  :>  TOKEN  =  struct  (*  •••  *)  end 

signature  ABSSYN  = 
sig 

datatype  abssyn  = 

Var  of  Id. id 
I  Num  of  int 
I  Bool  of  bool 
(*  • • •  *) 

val  makestring  :  abssyn  ->  string 
end 

structure  AbsSyn  :>  ABSSYN  =  struct  (*  •••  *)  end 

signature  LEXER  = 
sig 

val  lex_item  :  (Token. token  *  Pos.pos,  char)  Parsing. parser 
val  lex  : 

(char  *  Pos.pos)  Stream. stream  -> 

(Token. token  *  Pos.pos)  Stream. stream 

end 

structure  Lexer  :>  LEXER  =  struct  (*  •••  *)  end 

signature  PARSER  = 
sig 

val  parse_prog  :  (AbsSyn. abssyn,  Token. token)  Parsing. parser 
val  parse  :  (Token. token  *  Pos.pos)  Stream. stream 
->  AbsSyn. abssyn  Stream. stream 

end 

structure  Parser  :>  PARSER  =  struct  (*  •••  *)  end 
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signature  EVALUATOR  = 
sig 

val  evaluate  :  AbsSyn.abssyn  Stream. stream 
->  AbsSyn.abssyn  Stream. stream 

end 

structure  Evaluator  :>  EVALUATOR  =  struct  (*  •••  *)  end 

signature  INTERPRETER  = 
sig 

val  interpreter  :  char  Stream. stream  ->  string  Stream. stream 

val  interact  :  unit  ->  unit 
end 

structure  Interpreter  : >  INTERPRETER  = 
struct 

val  interpreter  = 

(Stream. map  AbsSyn. makestring)  o 

(Evaluator . evaluate)  o 

(Parser .parse)  o 

(Lexer. lex)  o 

(Pos . markstream) 

fun  interact  ()  = 
let 

fun  loop  s  = 

(case  Stream. expose  s  of 
NONE  =>  0 

I  SOME  (result,  s’)  => 

(print  (result  ~  "\n");  loop  s’)) 
val  is  =  interpreter  (Input .promptkeybd  ">  ") 
in 

loop  is 
end 

end 

end 

The  interpreter  is  organized  around  the  stream  library.  For  example,  the  Mini-ML  evaluator 
has  type 

Evaluator . evaluate  :  AbsSyn.abssyn  Stream. stream 

->  AbsSyn.abssyn  Stream. stream 

and  maps  a  stream  of  expressions  into  a  stream  of  values,  filtering  out  those  expressions  whose 
evaluation  gets  stuck  (and  printing  an  error  message).  This  organization  lends  itself  well  to  exper¬ 
imenting  and  debugging.  A  one-line  change 
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val  interpreter  = 

(Stream. map  AbsSyn. makestring)  o 
(*  (Evaluator . evaluate)  o  *) 

(Parser .parse)  o 
(Lexer. lex)  o 
(Pos . markstream) 

produces  a  variant  of  the  interpreter  that  accepts  syntactically  correct  Mini-ML  expressions  and 
prints  the  resulting  abstract  syntax. 

Library  Implementations 

Unit  Streamlmpl  performs  IC  against  the  unit  STREAMSIG: 

unit  Streamlmpl  = 
unit 

import  STREAMSIG 

functor  Stream(structure  BasicStream  :  BASIC_STREAM) 

:>  STREAM  where  type  ’a  stream  =  ’a  BasicStream. stream  = 
struct 

open  BasicStream 
exception  Empty 

fun  delay  t  =  create  (expose  o  t) 

(*  • • •  *) 
end 

structure  BasicStream  :>  BASIC_STREAM  = 
struct  (*  •  •  •  *)  end 

structure  PlainStream  :  STREAM  = 

Stream  (structure  BasicStream  =  BasicStream) 

structure  BasicMemoStream  :>  BASIC_STREAM  = 
struct  (*  •  •  •  *)  end 

(*  default  stream  package  memoizes  *) 
structure  Stream  =  MemoStream 
end 

Unit  Inputlmpl  performs  IC  against  INPUTSIG  and  StreamLib  but  SC  against  Streamlmpl.  It 
also  needs  a  way  to  perform  10:  We  assume  the  unit  TextIOLib  (not  shown)  declares  the  Standard 
Basis  Library  structure  Text  ID. 

unit  Inputlmpl  = 
unit 
local 

import  INPUTSIG  StreamLib  TextIOLib 
in 

structure  Input  :>  INPUT  =  struct  (*  •••  *)  end 
end 
end 
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We  declared  everything  specified  by  StreamLib  in  Streamimpl.  Here  we  demonstrate  an  al¬ 
ternative.  Unit  Parsinglmpl  imports  structure  Pos  and  functor  Parsing  from  auxilliary  units 
Poslmpl  and  ParsingFunlmpl: 

unit  Poslmpl  = 
unit 
local 

import  PARSINGSIG  StreamLib 
in 

structure  Pos  :>  POS  =  struct  (*  •••  *)  end 
end 
end 

unit  ParsingFunlmpl  = 
unit 
local 

import  PARSINGSIG  StreamLib  Poslmpl 
in 

functor  Parsing  (structure  BasicParsing  :  BAS1C_PARS1NG 

where  type  pos  =  Pos. pos)  :  PARSING  = 
struct  (*  •  •  •  *)  end 

end 

end 

unit  Parsinglmpl  = 
unit 
local 

import  PARSINGSIG  StreamLib 
in 

import  Poslmpl  ParsingFunlmpl 

structure  BasicParsing  :>  BAS1C_PARS1NG  where  type  pos  =  Pos. pos  = 
struct  (*  •  •  •  *)  end 

structure  Parsing  = 

Parsing  (structure  BasicParsing  =  BasicParsing) 

end 

end 

Linking 

To  create  an  executable  interpreter,  we  shall  link  unit  Interp  with  library  implementations  and 
the  unit 

unit  Runlnterp  = 
unit 

import  Interp 

val  0  =  Interpreter . interact  0 
end 
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that  gets  things  started.  We  begin  by  compiling  Interp  and  Runinterp: 


Lr  =  link{STREAMSIG,  StreamLib, 
INPUTSIG,  InputLib, 
PARSINGSIG,  ParsingLib, 
IntLib,  BoolLib, 

Interp, 

Runinterp). 


We  then  compile  the  libraries: 

Ls  =  link{ST  RE  AM  S I G ,  Streamlmpl) 

Lj  =  link{STREAMSIG,  StreamLib,  INPUTSIG,TextIOLib,  Inputlmpl) 

Lp  =  link{STREAMSIG,  StreamLib, 

PARSINGSIG,  Poslmpl,  Parsing P un I mpl,  ParsingImpI) 

Finally,  we  complete  the  linkset 

L  =  link{Ls,  LpLp,  Lr) 

to  an  executable  program.  When  run,  the  program  will  prompt  the  user,  parse  Mini-ML  expres¬ 
sions  from  its  input,  evaluate  these  in  turn,  and  print  the  resulting  values  (provided  evaluation 
terminates).  The  following  typescript  demonstrates  the  program  in  action. 

>  true+3; 

Run-time  error  arithmetic  type  error 

>  let  sq  be  fn  x  in  x*x  in  sq(sq  4) ; 

256 

>  let  fib  be  rec  f (x)  is 

>  if  x=0  then  1 

>  else  if  x=l  then  1 

>  else  f (x  +  ~1)  +  f (x  +  ~2) 

>  in  fib  7; 

21 

> 

An  implementation  could  avoid  unnecessary  recompilation,  maintaining  a  repository  of  compiled 
code  during  IC.  For  example,  such  an  implementation  might  employ 

Lo  =  link{STREAMSIG) 

Li  =  link{LQ,  Streamlmpl) 

L2  =  link{Lo,  StreamLib) 

L3  =  link{L2,  INPUTSIG,  TextIOLib,  Inputlmpl) 

L4  =  link{L2,  PARSINGSIG,  Poslmpl,  P ar sing PunI mpl,  ParsingImpI) 

to  avoid  recompiling  STREAMSIG  and  StreamLib  while  producing  Ls  =  Li,  Lj  =  L3,  and  Lp  =  L4. 
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